2023-11-19 16:28:29,738 INFO [train_asr.py:1330] (0/4) Training started 2023-11-19 16:28:29,744 INFO [train_asr.py:1340] (0/4) Device: cuda:0 2023-11-19 16:28:29,747 INFO [train_asr.py:1352] (0/4) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '2b2ac14b326d61d79d04e53fbd69b1ff6d630411', 'k2-git-date': 'Thu Aug 24 05:58:26 2023', 'lhotse-version': '1.16.0', 'torch-version': '2.0.1+cu117', 'torch-cuda-available': True, 'torch-cuda-version': '11.7', 'python-version': '3.1', 'icefall-git-branch': 'multi_KD', 'icefall-git-sha1': 'ae3d64ff-dirty', 'icefall-git-date': 'Sun Nov 19 00:54:09 2023', 'icefall-path': '/star-xy/softwares/icefall_development/icefall_multi_KD', 'k2-path': '/star-xy/softwares/k2_development/k2/k2/python/k2/__init__.py', 'lhotse-path': '/star-xy/softwares/anaconda3/envs/multi_KD/lib/python3.10/site-packages/lhotse/__init__.py', 'hostname': 'de-74279-k2-train-6-0423201309-7c68fd68fb-qfn6b', 'IP address': '10.177.58.19'}, 'world_size': 4, 'master_port': 13490, 'tensorboard': True, 'num_epochs': 40, 'start_epoch': 10, 'start_batch': 0, 'exp_dir': PosixPath('multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0'), 'bpe_model': 'data/lang_bpe_500/bpe.model', 'base_lr': 0.045, 'lr_batches': 7500, 'lr_epochs': 3.5, 'ref_duration': 600, 'context_size': 2, 'prune_range': 5, 'lm_scale': 0.25, 'am_scale': 0.0, 'simple_loss_scale': 0.5, 'ctc_loss_scale': 0.2, 'audio_tagging_loss_scale': 1.0, 'seed': 42, 'print_diagnostics': False, 'inf_check': False, 'save_every_n': 4000, 'keep_last_k': 30, 'average_period': 200, 'use_fp16': True, 'do_finetune': False, 'init_modules': None, 'freeze_modules': None, 'finetune_ckpt': None, 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'causal': False, 'chunk_size': '16,32,64,-1', 'left_context_frames': '64,128,256,-1', 'use_transducer': True, 'use_ctc': False, 'do_audio_tagging': True, 'use_encoder_projection': False, 'encoder_projection_dim': -1, 'freeze_encoder': False, 'freezing_encoder_layer_index': '-1', 'freeze_encoder_steps': -1, 'encoder_lr_scale': 1.0, 'full_libri': True, 'mini_libri': False, 'use_vox2': False, 'use_libriheavy': False, 'libriheavy_subset': 'small', 'use_audioset': True, 'audioset_subset': 'unbalanced', 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 1000, 'bucketing_sampler': False, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'enable_audioset': False, 'use_musan_separately': False, 'input_strategy': 'PrecomputedFeatures', 'drop_features': False, 'return_audio': False, 'use_beats': True, 'use_ecapa': True, 'use_whisper': True, 'whisper_mvq': False, 'beats_ckpt': 'data/models/BEATs/BEATs_iter3_plus_AS2M_finetuned_on_AS2M_cpt2.pt', 'whisper_version': 'small.en', 'blank_id': 0, 'vocab_size': 500} 2023-11-19 16:28:29,747 INFO [train_asr.py:1361] (0/4) About to create model 2023-11-19 16:28:30,816 INFO [train_asr.py:1365] (0/4) Number of model parameters: 65819362 2023-11-19 16:28:31,640 INFO [checkpoint.py:112] (0/4) Loading checkpoint from multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/epoch-9.pt 2023-11-19 16:28:34,229 INFO [checkpoint.py:131] (0/4) Loading averaged model 2023-11-19 16:28:34,568 INFO [checkpoint.py:112] (0/4) Loading checkpoint from multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/epoch-9.pt 2023-11-19 16:28:36,467 INFO [checkpoint.py:131] (0/4) Loading averaged model 2023-11-19 16:28:36,769 INFO [train_asr.py:1396] (0/4) Setting the lr scale of parameters in encoder and encoder_embed to 1.0 2023-11-19 16:28:40,611 INFO [train_asr.py:1405] (0/4) Using DDP 2023-11-19 16:28:40,973 INFO [train_asr.py:1428] (0/4) Loading optimizer state dict 2023-11-19 16:28:41,822 INFO [train_asr.py:1436] (0/4) Loading scheduler state dict 2023-11-19 16:28:41,826 INFO [train_asr.py:1458] (0/4) Getting audioset cuts 2023-11-19 16:28:41,826 INFO [kd_datamodule.py:796] (0/4) About to get the audioset cuts. 2023-11-19 16:28:41,828 INFO [train_asr.py:1464] (0/4) Using mux to combine Librispeech with audioset 2023-11-19 16:28:41,829 INFO [train_asr.py:1474] (0/4) CutSet(len=2748469) [underlying data type: ] 2023-11-19 16:28:57,702 INFO [kd_datamodule.py:396] (0/4) Enable MUSAN 2023-11-19 16:28:57,702 INFO [kd_datamodule.py:397] (0/4) About to get Musan cuts 2023-11-19 16:29:01,178 INFO [kd_datamodule.py:427] (0/4) Enable SpecAugment 2023-11-19 16:29:01,178 INFO [kd_datamodule.py:428] (0/4) Time warp factor: 80 2023-11-19 16:29:01,178 INFO [kd_datamodule.py:438] (0/4) Num frame mask: 10 2023-11-19 16:29:01,179 INFO [kd_datamodule.py:451] (0/4) About to create train dataset 2023-11-19 16:29:01,180 INFO [kd_datamodule.py:487] (0/4) Using SimpleCutSampler 2023-11-19 16:29:01,180 INFO [kd_datamodule.py:495] (0/4) About to create train dataloader 2023-11-19 16:29:01,184 INFO [kd_datamodule.py:814] (0/4) About to get the audioset eval cuts. 2023-11-19 16:29:01,186 INFO [train_asr.py:1538] (0/4) CutSet(len=20681) [underlying data type: ] 2023-11-19 16:29:01,292 INFO [kd_datamodule.py:529] (0/4) About to create dev dataset 2023-11-19 16:29:02,120 INFO [kd_datamodule.py:550] (0/4) About to create dev dataloader 2023-11-19 16:29:02,120 INFO [train_asr.py:1552] (0/4) Loading grad scaler state dict 2023-11-19 16:29:40,872 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 0, loss[loss=0.09472, simple_loss=0.0852, pruned_loss=0.02066, audio_tagging_loss=0.03146, over 16387.00 frames. ], tot_loss[loss=0.09472, simple_loss=0.0852, pruned_loss=0.02066, audio_tagging_loss=0.03146, over 16387.00 frames. ], batch size: 64, lr: 7.12e-03, grad_scale: 32.0 2023-11-19 16:29:40,876 INFO [train_asr.py:1285] (0/4) Computing validation loss 2023-11-19 16:30:07,659 INFO [zipformer.py:1873] (0/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.7932, 5.8338, 5.8640, 5.9054], device='cuda:0') 2023-11-19 16:30:18,317 INFO [train_asr.py:1294] (0/4) Epoch 10, validation: loss=0.06458, simple_loss=0.05578, pruned_loss=0.006608, audio_tagging_loss=0.03008, over 4681554.00 frames. 2023-11-19 16:30:18,318 INFO [train_asr.py:1295] (0/4) Maximum memory allocated so far is 25925MB 2023-11-19 16:30:20,287 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.83 vs. limit=15.0 2023-11-19 16:30:21,223 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=721400.0, ans=0.125 2023-11-19 16:30:23,078 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.11 vs. limit=10.0 2023-11-19 16:30:27,536 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.852e+01 8.461e+01 9.125e+01 9.697e+01 1.516e+02, threshold=1.825e+02, percent-clipped=0.0 2023-11-19 16:30:43,116 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=721466.6666666666, ans=0.125 2023-11-19 16:30:43,325 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=721466.6666666666, ans=0.125 2023-11-19 16:30:44,883 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 16:30:46,422 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.51 vs. limit=12.0 2023-11-19 16:31:10,017 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 108250 2023-11-19 16:31:15,114 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=721666.6666666666, ans=0.0 2023-11-19 16:31:18,791 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=721666.6666666666, ans=0.0 2023-11-19 16:31:23,189 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=721666.6666666666, ans=0.125 2023-11-19 16:31:26,680 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 50, loss[loss=0.1161, simple_loss=0.1435, pruned_loss=0.02799, audio_tagging_loss=0.01637, over 15596.00 frames. ], tot_loss[loss=0.09729, simple_loss=0.1062, pruned_loss=0.02392, audio_tagging_loss=0.02028, over 696823.56 frames. ], batch size: 55, lr: 7.12e-03, grad_scale: 32.0 2023-11-19 16:31:29,800 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.70 vs. limit=15.0 2023-11-19 16:31:37,020 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=721733.3333333334, ans=0.0 2023-11-19 16:31:40,909 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=721800.0, ans=0.125 2023-11-19 16:31:53,218 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=721866.6666666666, ans=0.2 2023-11-19 16:32:16,153 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 108300 2023-11-19 16:32:16,320 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=721933.3333333334, ans=0.125 2023-11-19 16:32:20,778 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=5.061e-01 2023-11-19 16:32:31,692 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 100, loss[loss=0.09076, simple_loss=0.1009, pruned_loss=0.02217, audio_tagging_loss=0.01811, over 14563.00 frames. ], tot_loss[loss=0.09485, simple_loss=0.1042, pruned_loss=0.02304, audio_tagging_loss=0.01971, over 1212268.57 frames. ], batch size: 54, lr: 7.12e-03, grad_scale: 32.0 2023-11-19 16:32:40,255 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.545e+01 8.782e+01 9.608e+01 1.042e+02 1.365e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-19 16:32:46,632 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=722133.3333333334, ans=0.035 2023-11-19 16:32:49,353 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=722133.3333333334, ans=0.0 2023-11-19 16:33:05,689 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=722200.0, ans=0.2 2023-11-19 16:33:18,058 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=722266.6666666666, ans=0.0 2023-11-19 16:33:20,363 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 108350 2023-11-19 16:33:26,095 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.75 vs. limit=6.0 2023-11-19 16:33:35,190 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 150, loss[loss=0.08075, simple_loss=0.09147, pruned_loss=0.0206, audio_tagging_loss=0.01441, over 15690.00 frames. ], tot_loss[loss=0.09157, simple_loss=0.1023, pruned_loss=0.02268, audio_tagging_loss=0.01774, over 1620229.62 frames. ], batch size: 59, lr: 7.12e-03, grad_scale: 32.0 2023-11-19 16:34:21,240 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=722600.0, ans=0.125 2023-11-19 16:34:24,515 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 108400 2023-11-19 16:34:25,134 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.78 vs. limit=15.0 2023-11-19 16:34:27,812 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=722666.6666666666, ans=0.1 2023-11-19 16:34:39,068 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=722733.3333333334, ans=0.125 2023-11-19 16:34:40,709 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 200, loss[loss=0.0845, simple_loss=0.1059, pruned_loss=0.02169, audio_tagging_loss=0.009868, over 16233.00 frames. ], tot_loss[loss=0.09055, simple_loss=0.1039, pruned_loss=0.02306, audio_tagging_loss=0.01552, over 1936905.68 frames. ], batch size: 61, lr: 7.12e-03, grad_scale: 16.0 2023-11-19 16:34:44,002 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=722733.3333333334, ans=0.125 2023-11-19 16:34:46,123 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=722733.3333333334, ans=0.125 2023-11-19 16:34:51,747 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.543e+01 8.378e+01 9.256e+01 1.031e+02 1.304e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-19 16:35:06,960 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=722866.6666666666, ans=0.125 2023-11-19 16:35:08,518 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.53 vs. limit=12.0 2023-11-19 16:35:11,039 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.40 vs. limit=10.0 2023-11-19 16:35:20,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=722933.3333333334, ans=0.1 2023-11-19 16:35:23,034 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=722933.3333333334, ans=0.2 2023-11-19 16:35:26,633 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=722933.3333333334, ans=0.0 2023-11-19 16:35:29,487 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 108450 2023-11-19 16:35:29,698 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=722933.3333333334, ans=0.0 2023-11-19 16:35:43,073 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=723000.0, ans=0.125 2023-11-19 16:35:43,112 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=723000.0, ans=0.0 2023-11-19 16:35:45,265 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 250, loss[loss=0.0921, simple_loss=0.1075, pruned_loss=0.02546, audio_tagging_loss=0.01288, over 16276.00 frames. ], tot_loss[loss=0.08895, simple_loss=0.1039, pruned_loss=0.0229, audio_tagging_loss=0.0141, over 2186541.25 frames. ], batch size: 60, lr: 7.12e-03, grad_scale: 16.0 2023-11-19 16:35:51,941 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.59 vs. limit=15.0 2023-11-19 16:36:04,343 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.90 vs. limit=15.0 2023-11-19 16:36:31,309 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=723266.6666666666, ans=0.05 2023-11-19 16:36:33,718 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 108500 2023-11-19 16:36:47,423 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=723400.0, ans=0.2 2023-11-19 16:36:48,238 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 300, loss[loss=0.0536, simple_loss=0.05271, pruned_loss=0.009946, audio_tagging_loss=0.0173, over 16226.00 frames. ], tot_loss[loss=0.08942, simple_loss=0.1057, pruned_loss=0.02359, audio_tagging_loss=0.01298, over 2379975.58 frames. ], batch size: 64, lr: 7.11e-03, grad_scale: 16.0 2023-11-19 16:36:54,777 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=723400.0, ans=0.1 2023-11-19 16:36:58,042 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.707e+01 8.557e+01 9.217e+01 9.967e+01 1.431e+02, threshold=1.843e+02, percent-clipped=0.0 2023-11-19 16:37:17,699 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=723533.3333333334, ans=0.125 2023-11-19 16:37:25,282 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=723533.3333333334, ans=0.1 2023-11-19 16:37:33,768 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=723600.0, ans=0.0 2023-11-19 16:37:37,034 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 108550 2023-11-19 16:37:39,782 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=723666.6666666666, ans=0.0 2023-11-19 16:37:43,939 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.49 vs. limit=15.0 2023-11-19 16:37:51,882 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 350, loss[loss=0.0982, simple_loss=0.1107, pruned_loss=0.03194, audio_tagging_loss=0.01091, over 14682.00 frames. ], tot_loss[loss=0.08815, simple_loss=0.1048, pruned_loss=0.02333, audio_tagging_loss=0.01243, over 2529442.60 frames. ], batch size: 58, lr: 7.11e-03, grad_scale: 16.0 2023-11-19 16:37:56,486 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=723733.3333333334, ans=0.125 2023-11-19 16:37:56,889 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.85 vs. limit=15.0 2023-11-19 16:38:10,012 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=723800.0, ans=0.0 2023-11-19 16:38:40,130 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 108600 2023-11-19 16:38:42,709 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.31 vs. limit=15.0 2023-11-19 16:38:55,823 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 400, loss[loss=0.07304, simple_loss=0.09186, pruned_loss=0.01942, audio_tagging_loss=0.007697, over 15208.00 frames. ], tot_loss[loss=0.08672, simple_loss=0.1038, pruned_loss=0.02296, audio_tagging_loss=0.01186, over 2649036.26 frames. ], batch size: 56, lr: 7.11e-03, grad_scale: 32.0 2023-11-19 16:39:02,876 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=724066.6666666666, ans=0.0 2023-11-19 16:39:06,216 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.939e+01 8.916e+01 9.621e+01 1.044e+02 1.431e+02, threshold=1.924e+02, percent-clipped=0.0 2023-11-19 16:39:09,006 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=724133.3333333334, ans=0.0 2023-11-19 16:39:10,362 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=724133.3333333334, ans=0.125 2023-11-19 16:39:40,206 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=724266.6666666666, ans=0.0 2023-11-19 16:39:44,953 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 108650 2023-11-19 16:39:59,918 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 450, loss[loss=0.1005, simple_loss=0.1248, pruned_loss=0.0279, audio_tagging_loss=0.01018, over 16095.00 frames. ], tot_loss[loss=0.0863, simple_loss=0.1037, pruned_loss=0.02295, audio_tagging_loss=0.0115, over 2730843.68 frames. ], batch size: 58, lr: 7.11e-03, grad_scale: 32.0 2023-11-19 16:40:37,406 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=724600.0, ans=0.125 2023-11-19 16:40:48,036 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 108700 2023-11-19 16:41:02,584 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 500, loss[loss=0.08178, simple_loss=0.1045, pruned_loss=0.02018, audio_tagging_loss=0.009357, over 15640.00 frames. ], tot_loss[loss=0.08618, simple_loss=0.1039, pruned_loss=0.02299, audio_tagging_loss=0.01123, over 2811845.76 frames. ], batch size: 59, lr: 7.11e-03, grad_scale: 32.0 2023-11-19 16:41:13,955 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.793e+01 8.347e+01 9.313e+01 1.051e+02 1.429e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-19 16:41:18,180 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=724800.0, ans=0.125 2023-11-19 16:41:22,018 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.35 vs. limit=15.0 2023-11-19 16:41:38,223 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=724866.6666666666, ans=0.0 2023-11-19 16:41:45,516 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=724933.3333333334, ans=0.0 2023-11-19 16:41:51,539 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 108750 2023-11-19 16:41:51,823 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=724933.3333333334, ans=0.0 2023-11-19 16:42:04,039 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=725000.0, ans=10.0 2023-11-19 16:42:06,508 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=725066.6666666666, ans=0.0 2023-11-19 16:42:07,387 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 550, loss[loss=0.1036, simple_loss=0.1267, pruned_loss=0.03004, audio_tagging_loss=0.01022, over 15141.00 frames. ], tot_loss[loss=0.08665, simple_loss=0.1049, pruned_loss=0.02318, audio_tagging_loss=0.01104, over 2869411.45 frames. ], batch size: 56, lr: 7.11e-03, grad_scale: 32.0 2023-11-19 16:42:07,633 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=725066.6666666666, ans=0.0 2023-11-19 16:42:16,923 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=725066.6666666666, ans=0.125 2023-11-19 16:42:31,135 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.06 vs. limit=15.0 2023-11-19 16:42:31,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=725200.0, ans=0.1 2023-11-19 16:42:34,999 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.31 vs. limit=6.0 2023-11-19 16:42:39,399 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=725200.0, ans=0.0 2023-11-19 16:42:41,800 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=725200.0, ans=0.1 2023-11-19 16:42:50,727 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.47 vs. limit=15.0 2023-11-19 16:42:56,349 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 108800 2023-11-19 16:42:59,669 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=725333.3333333334, ans=0.125 2023-11-19 16:43:12,328 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 600, loss[loss=0.06508, simple_loss=0.0762, pruned_loss=0.01569, audio_tagging_loss=0.01129, over 14756.00 frames. ], tot_loss[loss=0.08618, simple_loss=0.1046, pruned_loss=0.02297, audio_tagging_loss=0.01093, over 2914915.27 frames. ], batch size: 56, lr: 7.10e-03, grad_scale: 32.0 2023-11-19 16:43:21,974 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.124e+01 8.186e+01 8.803e+01 9.595e+01 1.577e+02, threshold=1.761e+02, percent-clipped=0.0 2023-11-19 16:43:29,691 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 16:43:50,554 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=725600.0, ans=0.125 2023-11-19 16:43:50,574 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=725600.0, ans=0.125 2023-11-19 16:44:01,065 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 108850 2023-11-19 16:44:08,573 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=725666.6666666666, ans=0.07 2023-11-19 16:44:15,750 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 650, loss[loss=0.0701, simple_loss=0.08908, pruned_loss=0.01634, audio_tagging_loss=0.009223, over 15004.00 frames. ], tot_loss[loss=0.0866, simple_loss=0.1052, pruned_loss=0.02318, audio_tagging_loss=0.01083, over 2946544.78 frames. ], batch size: 57, lr: 7.10e-03, grad_scale: 16.0 2023-11-19 16:44:16,381 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.11 vs. limit=12.0 2023-11-19 16:44:32,501 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.03 vs. limit=15.0 2023-11-19 16:44:33,399 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=725800.0, ans=0.125 2023-11-19 16:44:37,354 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.15 vs. limit=10.0 2023-11-19 16:44:43,032 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=725866.6666666666, ans=0.125 2023-11-19 16:44:43,572 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.27 vs. limit=6.0 2023-11-19 16:44:46,435 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=725866.6666666666, ans=0.05 2023-11-19 16:45:03,281 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.25 vs. limit=22.5 2023-11-19 16:45:04,121 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 108900 2023-11-19 16:45:08,354 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.81 vs. limit=6.0 2023-11-19 16:45:20,037 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 700, loss[loss=0.1127, simple_loss=0.1459, pruned_loss=0.03287, audio_tagging_loss=0.006886, over 15389.00 frames. ], tot_loss[loss=0.08765, simple_loss=0.1068, pruned_loss=0.02353, audio_tagging_loss=0.01073, over 2972090.39 frames. ], batch size: 55, lr: 7.10e-03, grad_scale: 16.0 2023-11-19 16:45:20,369 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=726066.6666666666, ans=0.2 2023-11-19 16:45:30,786 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.186e+01 8.106e+01 8.886e+01 9.595e+01 1.122e+02, threshold=1.777e+02, percent-clipped=0.0 2023-11-19 16:45:57,135 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=726266.6666666666, ans=0.2 2023-11-19 16:46:07,909 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 108950 2023-11-19 16:46:11,690 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=726333.3333333334, ans=0.1 2023-11-19 16:46:11,928 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=726333.3333333334, ans=0.0 2023-11-19 16:46:14,234 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=726333.3333333334, ans=0.125 2023-11-19 16:46:22,504 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 750, loss[loss=0.07144, simple_loss=0.08319, pruned_loss=0.01744, audio_tagging_loss=0.0124, over 14694.00 frames. ], tot_loss[loss=0.08777, simple_loss=0.1068, pruned_loss=0.02369, audio_tagging_loss=0.01071, over 2981103.07 frames. ], batch size: 58, lr: 7.10e-03, grad_scale: 16.0 2023-11-19 16:47:11,213 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 109000 2023-11-19 16:47:23,766 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=726666.6666666666, ans=0.125 2023-11-19 16:47:24,069 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.30 vs. limit=15.0 2023-11-19 16:47:27,213 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 800, loss[loss=0.08213, simple_loss=0.09649, pruned_loss=0.02139, audio_tagging_loss=0.01249, over 14404.00 frames. ], tot_loss[loss=0.08787, simple_loss=0.1068, pruned_loss=0.02365, audio_tagging_loss=0.01081, over 2997080.60 frames. ], batch size: 55, lr: 7.10e-03, grad_scale: 32.0 2023-11-19 16:47:34,998 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.50 vs. limit=15.0 2023-11-19 16:47:38,642 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.715e+01 8.461e+01 9.150e+01 1.030e+02 1.294e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-19 16:48:08,720 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=726933.3333333334, ans=0.125 2023-11-19 16:48:11,241 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=726933.3333333334, ans=0.2 2023-11-19 16:48:15,678 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 109050 2023-11-19 16:48:18,406 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=727000.0, ans=0.1 2023-11-19 16:48:19,845 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.83 vs. limit=15.0 2023-11-19 16:48:31,299 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 850, loss[loss=0.08928, simple_loss=0.1044, pruned_loss=0.02577, audio_tagging_loss=0.01134, over 15177.00 frames. ], tot_loss[loss=0.08874, simple_loss=0.1075, pruned_loss=0.02407, audio_tagging_loss=0.01091, over 3007263.05 frames. ], batch size: 58, lr: 7.10e-03, grad_scale: 32.0 2023-11-19 16:48:39,124 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.83 vs. limit=12.0 2023-11-19 16:49:08,802 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=727266.6666666666, ans=0.125 2023-11-19 16:49:15,665 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=727266.6666666666, ans=0.125 2023-11-19 16:49:19,693 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 109100 2023-11-19 16:49:22,354 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_na.min_abs, batch_count=727333.3333333334, ans=0.02 2023-11-19 16:49:25,904 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=727333.3333333334, ans=0.125 2023-11-19 16:49:33,204 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=727400.0, ans=0.125 2023-11-19 16:49:34,209 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 900, loss[loss=0.09415, simple_loss=0.1207, pruned_loss=0.02386, audio_tagging_loss=0.009917, over 15488.00 frames. ], tot_loss[loss=0.08815, simple_loss=0.1067, pruned_loss=0.02383, audio_tagging_loss=0.01096, over 3017905.72 frames. ], batch size: 56, lr: 7.09e-03, grad_scale: 32.0 2023-11-19 16:49:45,199 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.987e+01 8.269e+01 9.055e+01 9.679e+01 1.261e+02, threshold=1.811e+02, percent-clipped=0.0 2023-11-19 16:49:53,569 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=727466.6666666666, ans=0.1 2023-11-19 16:50:06,056 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=727533.3333333334, ans=0.1 2023-11-19 16:50:22,797 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 109150 2023-11-19 16:50:37,247 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 950, loss[loss=0.0896, simple_loss=0.1118, pruned_loss=0.0221, audio_tagging_loss=0.01161, over 15432.00 frames. ], tot_loss[loss=0.08773, simple_loss=0.1063, pruned_loss=0.02365, audio_tagging_loss=0.01094, over 3017800.71 frames. ], batch size: 58, lr: 7.09e-03, grad_scale: 32.0 2023-11-19 16:50:40,626 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=727733.3333333334, ans=0.125 2023-11-19 16:50:44,407 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=727733.3333333334, ans=0.07 2023-11-19 16:50:57,836 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=727800.0, ans=0.0 2023-11-19 16:51:02,123 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.08 vs. limit=15.0 2023-11-19 16:51:14,229 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.47 vs. limit=15.0 2023-11-19 16:51:25,984 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 109200 2023-11-19 16:51:42,986 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 1000, loss[loss=0.1211, simple_loss=0.149, pruned_loss=0.03709, audio_tagging_loss=0.009452, over 15483.00 frames. ], tot_loss[loss=0.08729, simple_loss=0.106, pruned_loss=0.02357, audio_tagging_loss=0.01073, over 3020248.07 frames. ], batch size: 55, lr: 7.09e-03, grad_scale: 32.0 2023-11-19 16:51:49,715 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.76 vs. limit=6.0 2023-11-19 16:51:53,754 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.418e+01 8.048e+01 8.889e+01 9.743e+01 1.398e+02, threshold=1.778e+02, percent-clipped=0.0 2023-11-19 16:52:01,420 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=728133.3333333334, ans=0.0 2023-11-19 16:52:04,937 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=728133.3333333334, ans=0.125 2023-11-19 16:52:08,478 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 16:52:24,303 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=728266.6666666666, ans=0.2 2023-11-19 16:52:31,590 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 109250 2023-11-19 16:52:46,314 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 1050, loss[loss=0.1039, simple_loss=0.134, pruned_loss=0.02939, audio_tagging_loss=0.007535, over 15537.00 frames. ], tot_loss[loss=0.08662, simple_loss=0.1052, pruned_loss=0.02336, audio_tagging_loss=0.01068, over 3025208.64 frames. ], batch size: 54, lr: 7.09e-03, grad_scale: 32.0 2023-11-19 16:52:56,656 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=728400.0, ans=0.5 2023-11-19 16:53:09,576 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=728466.6666666666, ans=0.125 2023-11-19 16:53:10,115 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.whiten.whitening_limit, batch_count=728466.6666666666, ans=15.0 2023-11-19 16:53:15,758 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=728533.3333333334, ans=0.125 2023-11-19 16:53:25,579 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=728600.0, ans=0.0 2023-11-19 16:53:35,473 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 109300 2023-11-19 16:53:49,974 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 1100, loss[loss=0.09285, simple_loss=0.1125, pruned_loss=0.0259, audio_tagging_loss=0.01069, over 13321.00 frames. ], tot_loss[loss=0.08693, simple_loss=0.1058, pruned_loss=0.02348, audio_tagging_loss=0.01056, over 3028508.38 frames. ], batch size: 52, lr: 7.09e-03, grad_scale: 32.0 2023-11-19 16:53:52,578 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 16:54:01,855 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.245e+01 8.411e+01 9.070e+01 1.020e+02 1.382e+02, threshold=1.814e+02, percent-clipped=0.0 2023-11-19 16:54:17,253 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=728866.6666666666, ans=0.2 2023-11-19 16:54:38,806 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 109350 2023-11-19 16:54:54,104 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 1150, loss[loss=0.06168, simple_loss=0.07946, pruned_loss=0.01212, audio_tagging_loss=0.009834, over 15309.00 frames. ], tot_loss[loss=0.08666, simple_loss=0.1056, pruned_loss=0.0234, audio_tagging_loss=0.01047, over 3031836.78 frames. ], batch size: 59, lr: 7.09e-03, grad_scale: 32.0 2023-11-19 16:55:26,243 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=729200.0, ans=0.1 2023-11-19 16:55:42,486 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 109400 2023-11-19 16:55:51,921 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=729333.3333333334, ans=0.0 2023-11-19 16:55:58,767 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 1200, loss[loss=0.09912, simple_loss=0.1209, pruned_loss=0.02915, audio_tagging_loss=0.009489, over 13733.00 frames. ], tot_loss[loss=0.08585, simple_loss=0.1047, pruned_loss=0.0231, audio_tagging_loss=0.0104, over 3035093.69 frames. ], batch size: 54, lr: 7.08e-03, grad_scale: 32.0 2023-11-19 16:56:09,733 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.219e+01 8.170e+01 9.038e+01 9.712e+01 1.366e+02, threshold=1.808e+02, percent-clipped=0.0 2023-11-19 16:56:17,506 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=729466.6666666666, ans=0.125 2023-11-19 16:56:17,959 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.21 vs. limit=12.0 2023-11-19 16:56:18,685 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=729466.6666666666, ans=0.125 2023-11-19 16:56:46,414 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=729600.0, ans=0.2 2023-11-19 16:56:47,487 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 109450 2023-11-19 16:57:02,015 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 1250, loss[loss=0.06309, simple_loss=0.08212, pruned_loss=0.01238, audio_tagging_loss=0.009642, over 14903.00 frames. ], tot_loss[loss=0.08664, simple_loss=0.106, pruned_loss=0.02331, audio_tagging_loss=0.01032, over 3038127.19 frames. ], batch size: 57, lr: 7.08e-03, grad_scale: 32.0 2023-11-19 16:57:02,329 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=729733.3333333334, ans=0.0 2023-11-19 16:57:04,767 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=729733.3333333334, ans=0.0 2023-11-19 16:57:08,650 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=729733.3333333334, ans=0.125 2023-11-19 16:57:25,277 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=729800.0, ans=0.125 2023-11-19 16:57:51,017 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 109500 2023-11-19 16:58:05,792 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 1300, loss[loss=0.07859, simple_loss=0.09847, pruned_loss=0.01882, audio_tagging_loss=0.01054, over 15511.00 frames. ], tot_loss[loss=0.08585, simple_loss=0.1052, pruned_loss=0.02297, audio_tagging_loss=0.01028, over 3035527.28 frames. ], batch size: 60, lr: 7.08e-03, grad_scale: 32.0 2023-11-19 16:58:11,800 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.11 vs. limit=8.0 2023-11-19 16:58:16,009 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=730066.6666666666, ans=0.0 2023-11-19 16:58:18,233 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.587e+01 8.086e+01 8.673e+01 9.719e+01 1.253e+02, threshold=1.735e+02, percent-clipped=0.0 2023-11-19 16:58:31,836 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=730200.0, ans=10.0 2023-11-19 16:58:38,493 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.23 vs. limit=15.0 2023-11-19 16:58:44,484 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=730266.6666666666, ans=0.0 2023-11-19 16:58:44,682 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.77 vs. limit=22.5 2023-11-19 16:58:53,110 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.538e-01 2023-11-19 16:58:54,104 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 109550 2023-11-19 16:59:01,139 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=730333.3333333334, ans=0.125 2023-11-19 16:59:07,366 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=730333.3333333334, ans=0.125 2023-11-19 16:59:10,774 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 1350, loss[loss=0.08256, simple_loss=0.08965, pruned_loss=0.02492, audio_tagging_loss=0.01282, over 14487.00 frames. ], tot_loss[loss=0.08643, simple_loss=0.1058, pruned_loss=0.02322, audio_tagging_loss=0.01032, over 3036412.60 frames. ], batch size: 56, lr: 7.08e-03, grad_scale: 32.0 2023-11-19 16:59:24,709 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=730466.6666666666, ans=0.125 2023-11-19 16:59:49,631 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.58 vs. limit=12.0 2023-11-19 16:59:56,678 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 16:59:59,161 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 109600 2023-11-19 16:59:59,466 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=730600.0, ans=0.125 2023-11-19 17:00:14,635 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 1400, loss[loss=0.1018, simple_loss=0.1215, pruned_loss=0.03204, audio_tagging_loss=0.009028, over 14990.00 frames. ], tot_loss[loss=0.08612, simple_loss=0.1049, pruned_loss=0.02324, audio_tagging_loss=0.01043, over 3036387.67 frames. ], batch size: 57, lr: 7.08e-03, grad_scale: 32.0 2023-11-19 17:00:15,069 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=730733.3333333334, ans=0.2 2023-11-19 17:00:16,207 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 17:00:24,896 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=730733.3333333334, ans=0.125 2023-11-19 17:00:25,873 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.607e+01 8.437e+01 9.173e+01 9.925e+01 1.308e+02, threshold=1.835e+02, percent-clipped=0.0 2023-11-19 17:00:28,673 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=730800.0, ans=10.0 2023-11-19 17:00:38,333 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=730800.0, ans=0.125 2023-11-19 17:00:45,098 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=730866.6666666666, ans=0.125 2023-11-19 17:01:03,371 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 109650 2023-11-19 17:01:07,360 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=731000.0, ans=0.125 2023-11-19 17:01:18,043 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 1450, loss[loss=0.09945, simple_loss=0.1121, pruned_loss=0.03341, audio_tagging_loss=0.009969, over 14604.00 frames. ], tot_loss[loss=0.08712, simple_loss=0.1059, pruned_loss=0.02372, audio_tagging_loss=0.01044, over 3036467.63 frames. ], batch size: 56, lr: 7.08e-03, grad_scale: 16.0 2023-11-19 17:01:28,218 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=731066.6666666666, ans=0.1 2023-11-19 17:01:36,260 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.68 vs. limit=15.0 2023-11-19 17:01:37,044 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=731133.3333333334, ans=0.125 2023-11-19 17:01:50,706 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=731200.0, ans=0.0 2023-11-19 17:02:06,409 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 109700 2023-11-19 17:02:22,228 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 1500, loss[loss=0.08337, simple_loss=0.1009, pruned_loss=0.02195, audio_tagging_loss=0.01096, over 15309.00 frames. ], tot_loss[loss=0.08746, simple_loss=0.1063, pruned_loss=0.02378, audio_tagging_loss=0.01051, over 3032608.98 frames. ], batch size: 57, lr: 7.08e-03, grad_scale: 16.0 2023-11-19 17:02:34,081 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=731466.6666666666, ans=0.2 2023-11-19 17:02:35,057 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.251e+01 8.288e+01 9.153e+01 9.955e+01 1.243e+02, threshold=1.831e+02, percent-clipped=0.0 2023-11-19 17:02:58,664 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 17:03:07,535 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=731600.0, ans=0.125 2023-11-19 17:03:08,746 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=731600.0, ans=0.125 2023-11-19 17:03:11,040 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 109750 2023-11-19 17:03:17,946 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=731666.6666666666, ans=0.125 2023-11-19 17:03:18,956 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=731666.6666666666, ans=0.1 2023-11-19 17:03:26,127 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 1550, loss[loss=0.07602, simple_loss=0.08667, pruned_loss=0.0221, audio_tagging_loss=0.01058, over 14566.00 frames. ], tot_loss[loss=0.08647, simple_loss=0.1047, pruned_loss=0.02337, audio_tagging_loss=0.01074, over 3036175.76 frames. ], batch size: 56, lr: 7.07e-03, grad_scale: 16.0 2023-11-19 17:03:35,046 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=731733.3333333334, ans=0.125 2023-11-19 17:03:37,516 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=731800.0, ans=0.1 2023-11-19 17:03:40,035 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=731800.0, ans=0.125 2023-11-19 17:03:50,416 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=731866.6666666666, ans=0.125 2023-11-19 17:03:54,386 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=731866.6666666666, ans=0.125 2023-11-19 17:04:14,521 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 109800 2023-11-19 17:04:29,614 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 1600, loss[loss=0.08654, simple_loss=0.1124, pruned_loss=0.02063, audio_tagging_loss=0.009711, over 16875.00 frames. ], tot_loss[loss=0.08553, simple_loss=0.1035, pruned_loss=0.02299, audio_tagging_loss=0.01077, over 3033502.35 frames. ], batch size: 62, lr: 7.07e-03, grad_scale: 32.0 2023-11-19 17:04:42,924 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.880e+01 8.694e+01 9.571e+01 1.026e+02 1.392e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-19 17:04:43,419 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=732133.3333333334, ans=0.09899494936611666 2023-11-19 17:04:58,074 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.16 vs. limit=8.0 2023-11-19 17:05:04,808 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=732200.0, ans=0.125 2023-11-19 17:05:04,822 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=732200.0, ans=0.0 2023-11-19 17:05:18,469 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 109850 2023-11-19 17:05:30,883 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=732333.3333333334, ans=0.0 2023-11-19 17:05:33,190 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=732400.0, ans=0.2 2023-11-19 17:05:34,213 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 1650, loss[loss=0.0855, simple_loss=0.1069, pruned_loss=0.02218, audio_tagging_loss=0.009858, over 14752.00 frames. ], tot_loss[loss=0.08579, simple_loss=0.104, pruned_loss=0.02301, audio_tagging_loss=0.01078, over 3039975.71 frames. ], batch size: 53, lr: 7.07e-03, grad_scale: 32.0 2023-11-19 17:05:49,799 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=732466.6666666666, ans=0.0 2023-11-19 17:06:07,059 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=732533.3333333334, ans=0.025 2023-11-19 17:06:09,464 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=732533.3333333334, ans=0.125 2023-11-19 17:06:22,807 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 109900 2023-11-19 17:06:24,181 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=732666.6666666666, ans=0.1 2023-11-19 17:06:38,160 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 1700, loss[loss=0.08223, simple_loss=0.1001, pruned_loss=0.02156, audio_tagging_loss=0.01063, over 15554.00 frames. ], tot_loss[loss=0.0849, simple_loss=0.1033, pruned_loss=0.0225, audio_tagging_loss=0.01077, over 3052382.24 frames. ], batch size: 60, lr: 7.07e-03, grad_scale: 32.0 2023-11-19 17:06:44,789 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=732733.3333333334, ans=0.0 2023-11-19 17:06:47,096 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=732733.3333333334, ans=0.125 2023-11-19 17:06:50,475 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.181e+01 8.219e+01 8.857e+01 9.747e+01 1.189e+02, threshold=1.771e+02, percent-clipped=0.0 2023-11-19 17:06:55,864 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=732800.0, ans=0.1 2023-11-19 17:06:57,220 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=732800.0, ans=0.5 2023-11-19 17:07:05,297 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=732866.6666666666, ans=0.125 2023-11-19 17:07:27,202 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 109950 2023-11-19 17:07:33,553 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=733000.0, ans=0.125 2023-11-19 17:07:34,849 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=733000.0, ans=0.1 2023-11-19 17:07:41,742 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 1750, loss[loss=0.09159, simple_loss=0.1151, pruned_loss=0.02618, audio_tagging_loss=0.007859, over 15367.00 frames. ], tot_loss[loss=0.08458, simple_loss=0.1031, pruned_loss=0.02241, audio_tagging_loss=0.01065, over 3053539.99 frames. ], batch size: 55, lr: 7.07e-03, grad_scale: 32.0 2023-11-19 17:07:45,008 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.63 vs. limit=15.0 2023-11-19 17:07:50,800 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=733066.6666666666, ans=0.0 2023-11-19 17:07:51,856 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 17:08:30,345 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 110000 2023-11-19 17:08:38,438 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=733333.3333333334, ans=0.125 2023-11-19 17:08:46,822 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 1800, loss[loss=0.0719, simple_loss=0.08948, pruned_loss=0.0188, audio_tagging_loss=0.008358, over 14879.00 frames. ], tot_loss[loss=0.08615, simple_loss=0.1051, pruned_loss=0.02306, audio_tagging_loss=0.01052, over 3049162.13 frames. ], batch size: 56, lr: 7.07e-03, grad_scale: 32.0 2023-11-19 17:08:47,491 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.36 vs. limit=22.5 2023-11-19 17:08:52,101 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=733400.0, ans=0.1 2023-11-19 17:09:00,177 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.669e+01 8.372e+01 9.088e+01 1.009e+02 1.305e+02, threshold=1.818e+02, percent-clipped=0.0 2023-11-19 17:09:17,297 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=733533.3333333334, ans=0.0 2023-11-19 17:09:27,126 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=733600.0, ans=0.0 2023-11-19 17:09:35,485 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 110050 2023-11-19 17:09:42,021 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.69 vs. limit=15.0 2023-11-19 17:09:50,022 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 1850, loss[loss=0.0954, simple_loss=0.1206, pruned_loss=0.02483, audio_tagging_loss=0.01025, over 14437.00 frames. ], tot_loss[loss=0.08499, simple_loss=0.1038, pruned_loss=0.02262, audio_tagging_loss=0.01049, over 3044536.74 frames. ], batch size: 57, lr: 7.06e-03, grad_scale: 16.0 2023-11-19 17:10:38,798 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 110100 2023-11-19 17:10:39,124 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=733933.3333333334, ans=0.125 2023-11-19 17:10:49,741 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=734000.0, ans=0.125 2023-11-19 17:10:54,480 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 1900, loss[loss=0.09168, simple_loss=0.1083, pruned_loss=0.02746, audio_tagging_loss=0.01006, over 15636.00 frames. ], tot_loss[loss=0.08573, simple_loss=0.1049, pruned_loss=0.0229, audio_tagging_loss=0.01037, over 3044897.53 frames. ], batch size: 59, lr: 7.06e-03, grad_scale: 16.0 2023-11-19 17:11:08,897 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.512e+01 8.528e+01 8.978e+01 9.700e+01 1.316e+02, threshold=1.796e+02, percent-clipped=0.0 2023-11-19 17:11:14,663 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=734133.3333333334, ans=0.025 2023-11-19 17:11:15,753 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=734133.3333333334, ans=0.025 2023-11-19 17:11:43,532 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 110150 2023-11-19 17:11:43,535 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=734266.6666666666, ans=0.125 2023-11-19 17:11:46,281 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=734333.3333333334, ans=0.0 2023-11-19 17:11:49,136 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.18 vs. limit=15.0 2023-11-19 17:11:55,986 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=734333.3333333334, ans=0.125 2023-11-19 17:11:59,393 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 1950, loss[loss=0.08503, simple_loss=0.1039, pruned_loss=0.01992, audio_tagging_loss=0.01316, over 14331.00 frames. ], tot_loss[loss=0.08487, simple_loss=0.1039, pruned_loss=0.02252, audio_tagging_loss=0.01041, over 3042577.61 frames. ], batch size: 53, lr: 7.06e-03, grad_scale: 16.0 2023-11-19 17:12:02,005 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=734400.0, ans=0.125 2023-11-19 17:12:02,205 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=734400.0, ans=0.2 2023-11-19 17:12:33,676 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=734533.3333333334, ans=0.0 2023-11-19 17:12:48,382 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 110200 2023-11-19 17:13:01,399 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=734666.6666666666, ans=0.125 2023-11-19 17:13:03,588 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 2000, loss[loss=0.09589, simple_loss=0.108, pruned_loss=0.02978, audio_tagging_loss=0.01212, over 14546.00 frames. ], tot_loss[loss=0.08487, simple_loss=0.1036, pruned_loss=0.02259, audio_tagging_loss=0.01049, over 3034511.37 frames. ], batch size: 54, lr: 7.06e-03, grad_scale: 32.0 2023-11-19 17:13:09,259 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.34 vs. limit=15.0 2023-11-19 17:13:17,911 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.098e+01 8.336e+01 8.914e+01 9.433e+01 1.309e+02, threshold=1.783e+02, percent-clipped=0.0 2023-11-19 17:13:32,222 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=734866.6666666666, ans=0.1 2023-11-19 17:13:34,439 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=734866.6666666666, ans=0.0 2023-11-19 17:13:52,751 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 110250 2023-11-19 17:13:52,869 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=734933.3333333334, ans=0.035 2023-11-19 17:14:04,117 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=735000.0, ans=0.0 2023-11-19 17:14:08,076 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 2050, loss[loss=0.1207, simple_loss=0.1458, pruned_loss=0.04064, audio_tagging_loss=0.00718, over 15605.00 frames. ], tot_loss[loss=0.08587, simple_loss=0.105, pruned_loss=0.02297, audio_tagging_loss=0.01041, over 3035289.70 frames. ], batch size: 56, lr: 7.06e-03, grad_scale: 32.0 2023-11-19 17:14:09,600 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=735066.6666666666, ans=0.125 2023-11-19 17:14:18,125 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=735066.6666666666, ans=0.09899494936611666 2023-11-19 17:14:27,829 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=735133.3333333334, ans=0.125 2023-11-19 17:14:55,498 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 110300 2023-11-19 17:15:10,070 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=735400.0, ans=0.0 2023-11-19 17:15:10,850 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 2100, loss[loss=0.1136, simple_loss=0.1563, pruned_loss=0.03029, audio_tagging_loss=0.005172, over 14916.00 frames. ], tot_loss[loss=0.08593, simple_loss=0.1051, pruned_loss=0.02304, audio_tagging_loss=0.01034, over 3034997.67 frames. ], batch size: 53, lr: 7.06e-03, grad_scale: 32.0 2023-11-19 17:15:15,898 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=735400.0, ans=0.0 2023-11-19 17:15:25,432 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.080e+01 8.159e+01 8.890e+01 9.967e+01 1.434e+02, threshold=1.778e+02, percent-clipped=0.0 2023-11-19 17:15:35,711 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=735533.3333333334, ans=0.0 2023-11-19 17:15:44,591 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.89 vs. limit=10.0 2023-11-19 17:15:48,638 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=735600.0, ans=0.125 2023-11-19 17:15:55,323 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.73 vs. limit=22.5 2023-11-19 17:15:59,791 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 110350 2023-11-19 17:16:15,537 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 2150, loss[loss=0.09602, simple_loss=0.1233, pruned_loss=0.0259, audio_tagging_loss=0.008458, over 15661.00 frames. ], tot_loss[loss=0.08622, simple_loss=0.1055, pruned_loss=0.02313, audio_tagging_loss=0.01034, over 3035078.28 frames. ], batch size: 55, lr: 7.05e-03, grad_scale: 32.0 2023-11-19 17:16:24,577 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=735733.3333333334, ans=0.0 2023-11-19 17:16:24,597 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=735733.3333333334, ans=0.1 2023-11-19 17:16:29,430 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=735800.0, ans=0.0 2023-11-19 17:16:33,509 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.71 vs. limit=15.0 2023-11-19 17:16:37,301 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=735800.0, ans=0.1 2023-11-19 17:16:37,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=735800.0, ans=0.0 2023-11-19 17:16:46,586 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.79 vs. limit=15.0 2023-11-19 17:16:49,390 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=735866.6666666666, ans=0.07 2023-11-19 17:16:55,022 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 17:17:01,554 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=735933.3333333334, ans=0.125 2023-11-19 17:17:04,969 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 110400 2023-11-19 17:17:16,960 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=736000.0, ans=0.125 2023-11-19 17:17:20,325 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 2200, loss[loss=0.1074, simple_loss=0.1318, pruned_loss=0.03284, audio_tagging_loss=0.008644, over 16212.00 frames. ], tot_loss[loss=0.08612, simple_loss=0.1056, pruned_loss=0.02299, audio_tagging_loss=0.01035, over 3042693.46 frames. ], batch size: 56, lr: 7.05e-03, grad_scale: 16.0 2023-11-19 17:17:25,660 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=736066.6666666666, ans=0.125 2023-11-19 17:17:35,744 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.097e+01 8.594e+01 9.409e+01 1.055e+02 1.451e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-19 17:17:37,275 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=736133.3333333334, ans=0.0 2023-11-19 17:17:37,319 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 17:17:55,356 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=736200.0, ans=0.0 2023-11-19 17:18:03,125 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.57 vs. limit=15.0 2023-11-19 17:18:04,000 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=736266.6666666666, ans=0.2 2023-11-19 17:18:10,098 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 110450 2023-11-19 17:18:12,771 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=736333.3333333334, ans=0.125 2023-11-19 17:18:25,402 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 2250, loss[loss=0.07222, simple_loss=0.08961, pruned_loss=0.01699, audio_tagging_loss=0.01043, over 15435.00 frames. ], tot_loss[loss=0.08605, simple_loss=0.1052, pruned_loss=0.02305, audio_tagging_loss=0.01039, over 3038710.58 frames. ], batch size: 59, lr: 7.05e-03, grad_scale: 16.0 2023-11-19 17:18:36,229 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=736400.0, ans=0.0 2023-11-19 17:19:03,482 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=736600.0, ans=0.02 2023-11-19 17:19:15,056 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 110500 2023-11-19 17:19:18,482 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=736666.6666666666, ans=0.1 2023-11-19 17:19:25,796 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=736666.6666666666, ans=0.125 2023-11-19 17:19:27,268 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.54 vs. limit=15.0 2023-11-19 17:19:31,166 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.13 vs. limit=22.5 2023-11-19 17:19:31,620 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 2300, loss[loss=0.09678, simple_loss=0.1151, pruned_loss=0.02756, audio_tagging_loss=0.01167, over 14432.00 frames. ], tot_loss[loss=0.0856, simple_loss=0.1043, pruned_loss=0.02299, audio_tagging_loss=0.01045, over 3031625.68 frames. ], batch size: 56, lr: 7.05e-03, grad_scale: 8.0 2023-11-19 17:19:35,445 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=736733.3333333334, ans=0.125 2023-11-19 17:19:42,779 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=736800.0, ans=0.0 2023-11-19 17:19:47,047 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.24 vs. limit=12.0 2023-11-19 17:19:47,340 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.852e+01 8.474e+01 9.296e+01 1.022e+02 1.350e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-19 17:20:21,189 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 110550 2023-11-19 17:20:28,659 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 17:20:30,197 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=737000.0, ans=0.0 2023-11-19 17:20:36,088 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 2350, loss[loss=0.08188, simple_loss=0.09633, pruned_loss=0.022, audio_tagging_loss=0.01171, over 14468.00 frames. ], tot_loss[loss=0.08516, simple_loss=0.1033, pruned_loss=0.02291, audio_tagging_loss=0.01058, over 3029497.15 frames. ], batch size: 54, lr: 7.05e-03, grad_scale: 8.0 2023-11-19 17:20:39,081 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=737066.6666666666, ans=0.125 2023-11-19 17:20:59,038 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=737133.3333333334, ans=0.125 2023-11-19 17:21:03,799 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.05 vs. limit=15.0 2023-11-19 17:21:08,429 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.47 vs. limit=15.0 2023-11-19 17:21:25,371 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 110600 2023-11-19 17:21:34,738 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=737333.3333333334, ans=0.125 2023-11-19 17:21:39,992 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.84 vs. limit=15.0 2023-11-19 17:21:40,460 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 2400, loss[loss=0.09008, simple_loss=0.1179, pruned_loss=0.02346, audio_tagging_loss=0.007649, over 14595.00 frames. ], tot_loss[loss=0.0857, simple_loss=0.1042, pruned_loss=0.023, audio_tagging_loss=0.01057, over 3032341.28 frames. ], batch size: 53, lr: 7.05e-03, grad_scale: 16.0 2023-11-19 17:21:58,860 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.978e+01 8.550e+01 9.088e+01 1.010e+02 1.299e+02, threshold=1.818e+02, percent-clipped=0.0 2023-11-19 17:22:00,440 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=737466.6666666666, ans=0.125 2023-11-19 17:22:06,480 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=737533.3333333334, ans=0.2 2023-11-19 17:22:09,180 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.34 vs. limit=15.0 2023-11-19 17:22:17,252 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=737533.3333333334, ans=0.125 2023-11-19 17:22:21,196 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=737600.0, ans=0.125 2023-11-19 17:22:29,534 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 110650 2023-11-19 17:22:33,966 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=737666.6666666666, ans=0.125 2023-11-19 17:22:46,858 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 2450, loss[loss=0.07841, simple_loss=0.08726, pruned_loss=0.02309, audio_tagging_loss=0.01169, over 13481.00 frames. ], tot_loss[loss=0.08608, simple_loss=0.1048, pruned_loss=0.02314, audio_tagging_loss=0.01054, over 3031187.70 frames. ], batch size: 51, lr: 7.04e-03, grad_scale: 16.0 2023-11-19 17:23:08,124 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.69 vs. limit=15.0 2023-11-19 17:23:13,831 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=737866.6666666666, ans=0.0 2023-11-19 17:23:15,091 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=737866.6666666666, ans=0.1 2023-11-19 17:23:35,181 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 110700 2023-11-19 17:23:39,322 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.40 vs. limit=15.0 2023-11-19 17:23:47,369 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=738000.0, ans=0.125 2023-11-19 17:23:49,625 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 2500, loss[loss=0.09638, simple_loss=0.1134, pruned_loss=0.02858, audio_tagging_loss=0.01112, over 15039.00 frames. ], tot_loss[loss=0.08666, simple_loss=0.1055, pruned_loss=0.02339, audio_tagging_loss=0.01053, over 3038487.33 frames. ], batch size: 56, lr: 7.04e-03, grad_scale: 16.0 2023-11-19 17:24:05,694 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.741e+01 8.346e+01 8.795e+01 9.751e+01 1.396e+02, threshold=1.759e+02, percent-clipped=0.0 2023-11-19 17:24:38,646 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 110750 2023-11-19 17:24:53,097 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 2550, loss[loss=0.08319, simple_loss=0.1001, pruned_loss=0.02371, audio_tagging_loss=0.009433, over 15555.00 frames. ], tot_loss[loss=0.0865, simple_loss=0.1055, pruned_loss=0.02329, audio_tagging_loss=0.01044, over 3038408.33 frames. ], batch size: 57, lr: 7.04e-03, grad_scale: 16.0 2023-11-19 17:24:59,532 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=738400.0, ans=0.125 2023-11-19 17:25:01,587 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=738400.0, ans=0.2 2023-11-19 17:25:07,232 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.82 vs. limit=15.0 2023-11-19 17:25:29,284 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.34 vs. limit=15.0 2023-11-19 17:25:41,765 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.67 vs. limit=15.0 2023-11-19 17:25:42,394 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 110800 2023-11-19 17:26:00,185 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 2600, loss[loss=0.0714, simple_loss=0.08773, pruned_loss=0.01861, audio_tagging_loss=0.008923, over 14480.00 frames. ], tot_loss[loss=0.08523, simple_loss=0.104, pruned_loss=0.02285, audio_tagging_loss=0.01035, over 3033483.34 frames. ], batch size: 56, lr: 7.04e-03, grad_scale: 16.0 2023-11-19 17:26:16,152 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.092e+01 8.270e+01 8.898e+01 9.575e+01 2.029e+02, threshold=1.780e+02, percent-clipped=1.0 2023-11-19 17:26:19,906 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=738800.0, ans=0.125 2023-11-19 17:26:22,782 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=738800.0, ans=0.0 2023-11-19 17:26:24,438 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.07 vs. limit=6.0 2023-11-19 17:26:31,347 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=738866.6666666666, ans=0.0 2023-11-19 17:26:41,793 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=738933.3333333334, ans=0.125 2023-11-19 17:26:46,985 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=738933.3333333334, ans=0.125 2023-11-19 17:26:49,063 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 110850 2023-11-19 17:27:03,858 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 2650, loss[loss=0.1076, simple_loss=0.1329, pruned_loss=0.03189, audio_tagging_loss=0.009284, over 15971.00 frames. ], tot_loss[loss=0.08473, simple_loss=0.1036, pruned_loss=0.02257, audio_tagging_loss=0.01035, over 3040095.69 frames. ], batch size: 61, lr: 7.04e-03, grad_scale: 16.0 2023-11-19 17:27:05,390 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=739066.6666666666, ans=0.0 2023-11-19 17:27:11,476 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=739066.6666666666, ans=0.0 2023-11-19 17:27:13,098 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.23 vs. limit=10.0 2023-11-19 17:27:13,731 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=739066.6666666666, ans=0.125 2023-11-19 17:27:39,702 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=739200.0, ans=0.125 2023-11-19 17:27:53,053 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 110900 2023-11-19 17:28:07,640 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 2700, loss[loss=0.1018, simple_loss=0.1296, pruned_loss=0.02865, audio_tagging_loss=0.008351, over 15945.00 frames. ], tot_loss[loss=0.08444, simple_loss=0.1032, pruned_loss=0.02246, audio_tagging_loss=0.01036, over 3046144.70 frames. ], batch size: 59, lr: 7.04e-03, grad_scale: 16.0 2023-11-19 17:28:15,716 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.11 vs. limit=22.5 2023-11-19 17:28:25,562 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.580e+01 8.552e+01 9.403e+01 1.042e+02 1.397e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-19 17:28:37,508 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=739533.3333333334, ans=0.125 2023-11-19 17:28:52,278 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=739600.0, ans=0.1 2023-11-19 17:28:57,045 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 110950 2023-11-19 17:28:57,158 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=739600.0, ans=0.125 2023-11-19 17:29:01,070 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=739666.6666666666, ans=0.2 2023-11-19 17:29:03,334 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=739666.6666666666, ans=0.0 2023-11-19 17:29:03,365 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=739666.6666666666, ans=0.2 2023-11-19 17:29:06,147 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.28 vs. limit=22.5 2023-11-19 17:29:12,986 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 2750, loss[loss=0.08685, simple_loss=0.1135, pruned_loss=0.02255, audio_tagging_loss=0.007572, over 13819.00 frames. ], tot_loss[loss=0.08502, simple_loss=0.1041, pruned_loss=0.02266, audio_tagging_loss=0.01032, over 3046677.26 frames. ], batch size: 54, lr: 7.04e-03, grad_scale: 16.0 2023-11-19 17:29:17,502 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=739733.3333333334, ans=0.125 2023-11-19 17:29:18,939 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=739733.3333333334, ans=0.1 2023-11-19 17:29:48,132 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.00 vs. limit=15.0 2023-11-19 17:30:01,042 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 111000 2023-11-19 17:30:06,997 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 17:30:09,582 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=740000.0, ans=0.1 2023-11-19 17:30:16,677 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 2800, loss[loss=0.1099, simple_loss=0.1406, pruned_loss=0.03022, audio_tagging_loss=0.009391, over 15627.00 frames. ], tot_loss[loss=0.08466, simple_loss=0.1034, pruned_loss=0.02268, audio_tagging_loss=0.01028, over 3045938.33 frames. ], batch size: 56, lr: 7.03e-03, grad_scale: 32.0 2023-11-19 17:30:18,238 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=740066.6666666666, ans=0.0 2023-11-19 17:30:32,829 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.580e+01 8.267e+01 8.759e+01 9.728e+01 1.191e+02, threshold=1.752e+02, percent-clipped=0.0 2023-11-19 17:30:34,888 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.23 vs. limit=15.0 2023-11-19 17:30:38,116 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=740133.3333333334, ans=0.125 2023-11-19 17:30:46,285 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=740200.0, ans=0.0 2023-11-19 17:31:05,936 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 111050 2023-11-19 17:31:13,425 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=740333.3333333334, ans=0.125 2023-11-19 17:31:15,030 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.95 vs. limit=6.0 2023-11-19 17:31:20,596 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 2850, loss[loss=0.1209, simple_loss=0.15, pruned_loss=0.03793, audio_tagging_loss=0.008042, over 15128.00 frames. ], tot_loss[loss=0.08582, simple_loss=0.1051, pruned_loss=0.02315, audio_tagging_loss=0.01015, over 3044320.28 frames. ], batch size: 55, lr: 7.03e-03, grad_scale: 32.0 2023-11-19 17:31:28,730 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.63 vs. limit=22.5 2023-11-19 17:31:58,556 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=740600.0, ans=0.1 2023-11-19 17:32:02,148 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=740600.0, ans=0.0 2023-11-19 17:32:06,382 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.45 vs. limit=15.0 2023-11-19 17:32:09,365 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 111100 2023-11-19 17:32:14,298 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=740666.6666666666, ans=0.125 2023-11-19 17:32:16,964 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=740666.6666666666, ans=0.125 2023-11-19 17:32:25,179 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 2900, loss[loss=0.07503, simple_loss=0.08698, pruned_loss=0.01809, audio_tagging_loss=0.01345, over 16233.00 frames. ], tot_loss[loss=0.08599, simple_loss=0.1051, pruned_loss=0.02321, audio_tagging_loss=0.01025, over 3044155.81 frames. ], batch size: 64, lr: 7.03e-03, grad_scale: 16.0 2023-11-19 17:32:25,361 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=740733.3333333334, ans=0.0 2023-11-19 17:32:25,502 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=740733.3333333334, ans=0.1 2023-11-19 17:32:43,308 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.731e+01 8.402e+01 9.299e+01 9.982e+01 1.196e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-19 17:32:46,066 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=740800.0, ans=0.125 2023-11-19 17:32:46,071 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=740800.0, ans=0.125 2023-11-19 17:32:47,453 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=740800.0, ans=0.2 2023-11-19 17:33:14,527 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 111150 2023-11-19 17:33:16,981 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=741000.0, ans=0.125 2023-11-19 17:33:28,971 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 2950, loss[loss=0.07989, simple_loss=0.09821, pruned_loss=0.02199, audio_tagging_loss=0.008795, over 14702.00 frames. ], tot_loss[loss=0.08692, simple_loss=0.1064, pruned_loss=0.02346, audio_tagging_loss=0.01028, over 3040435.40 frames. ], batch size: 55, lr: 7.03e-03, grad_scale: 16.0 2023-11-19 17:33:41,059 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=741133.3333333334, ans=0.125 2023-11-19 17:33:41,623 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.24 vs. limit=15.0 2023-11-19 17:34:06,726 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=741266.6666666666, ans=0.125 2023-11-19 17:34:17,332 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 111200 2023-11-19 17:34:33,025 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 3000, loss[loss=0.1076, simple_loss=0.1361, pruned_loss=0.03148, audio_tagging_loss=0.008025, over 15821.00 frames. ], tot_loss[loss=0.08663, simple_loss=0.1058, pruned_loss=0.02336, audio_tagging_loss=0.01037, over 3042515.26 frames. ], batch size: 57, lr: 7.03e-03, grad_scale: 16.0 2023-11-19 17:34:33,028 INFO [train_asr.py:1285] (0/4) Computing validation loss 2023-11-19 17:35:14,017 INFO [train_asr.py:1294] (0/4) Epoch 10, validation: loss=0.06437, simple_loss=0.0554, pruned_loss=0.006444, audio_tagging_loss=0.03022, over 4681554.00 frames. 2023-11-19 17:35:14,018 INFO [train_asr.py:1295] (0/4) Maximum memory allocated so far is 25925MB 2023-11-19 17:35:28,016 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=741466.6666666666, ans=0.0 2023-11-19 17:35:31,917 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.687e+01 8.390e+01 9.154e+01 1.009e+02 1.642e+02, threshold=1.831e+02, percent-clipped=0.0 2023-11-19 17:35:42,829 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.01 vs. limit=6.0 2023-11-19 17:35:44,895 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=741533.3333333334, ans=0.2 2023-11-19 17:35:58,292 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=741600.0, ans=0.1 2023-11-19 17:36:03,177 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 111250 2023-11-19 17:36:17,858 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 3050, loss[loss=0.08899, simple_loss=0.1106, pruned_loss=0.02523, audio_tagging_loss=0.008448, over 15413.00 frames. ], tot_loss[loss=0.08644, simple_loss=0.1056, pruned_loss=0.0232, audio_tagging_loss=0.01044, over 3038427.65 frames. ], batch size: 58, lr: 7.03e-03, grad_scale: 16.0 2023-11-19 17:36:32,256 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=741800.0, ans=0.1 2023-11-19 17:36:39,964 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.85 vs. limit=12.0 2023-11-19 17:36:46,855 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=741866.6666666666, ans=0.125 2023-11-19 17:36:54,352 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=741866.6666666666, ans=0.125 2023-11-19 17:36:55,122 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 17:37:01,522 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=741933.3333333334, ans=0.1 2023-11-19 17:37:06,340 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 111300 2023-11-19 17:37:13,683 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=742000.0, ans=0.0 2023-11-19 17:37:21,119 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=742066.6666666666, ans=0.125 2023-11-19 17:37:21,855 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 3100, loss[loss=0.08528, simple_loss=0.09417, pruned_loss=0.02577, audio_tagging_loss=0.01243, over 13950.00 frames. ], tot_loss[loss=0.08692, simple_loss=0.1059, pruned_loss=0.02339, audio_tagging_loss=0.01058, over 3036351.75 frames. ], batch size: 55, lr: 7.02e-03, grad_scale: 16.0 2023-11-19 17:37:27,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=742066.6666666666, ans=0.95 2023-11-19 17:37:35,576 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=742133.3333333334, ans=0.125 2023-11-19 17:37:40,856 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.012e+01 8.266e+01 9.120e+01 9.877e+01 1.232e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-19 17:37:48,438 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=742200.0, ans=0.125 2023-11-19 17:37:59,772 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=742266.6666666666, ans=0.125 2023-11-19 17:37:59,791 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=742266.6666666666, ans=0.125 2023-11-19 17:38:11,354 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 111350 2023-11-19 17:38:15,321 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=742333.3333333334, ans=0.07 2023-11-19 17:38:21,576 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=742333.3333333334, ans=0.1 2023-11-19 17:38:26,710 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 3150, loss[loss=0.07657, simple_loss=0.08674, pruned_loss=0.02056, audio_tagging_loss=0.01264, over 15051.00 frames. ], tot_loss[loss=0.08685, simple_loss=0.1059, pruned_loss=0.02329, audio_tagging_loss=0.01061, over 3029588.98 frames. ], batch size: 57, lr: 7.02e-03, grad_scale: 16.0 2023-11-19 17:38:44,154 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=742466.6666666666, ans=0.2 2023-11-19 17:39:15,734 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 111400 2023-11-19 17:39:18,483 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=742666.6666666666, ans=0.125 2023-11-19 17:39:19,532 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=742666.6666666666, ans=0.025 2023-11-19 17:39:21,459 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=742666.6666666666, ans=0.1 2023-11-19 17:39:23,972 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=742666.6666666666, ans=0.125 2023-11-19 17:39:32,228 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 3200, loss[loss=0.08344, simple_loss=0.1023, pruned_loss=0.022, audio_tagging_loss=0.01029, over 14390.00 frames. ], tot_loss[loss=0.08714, simple_loss=0.1063, pruned_loss=0.02334, audio_tagging_loss=0.01065, over 3034335.16 frames. ], batch size: 56, lr: 7.02e-03, grad_scale: 32.0 2023-11-19 17:39:33,708 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=742733.3333333334, ans=0.0 2023-11-19 17:39:43,820 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=742800.0, ans=0.2 2023-11-19 17:39:48,286 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=742800.0, ans=0.0 2023-11-19 17:39:50,364 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.737e+01 8.277e+01 9.297e+01 1.012e+02 1.250e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-19 17:40:14,644 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=742933.3333333334, ans=0.125 2023-11-19 17:40:18,517 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 17:40:21,063 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 17:40:22,043 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 111450 2023-11-19 17:40:22,282 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=742933.3333333334, ans=0.125 2023-11-19 17:40:28,523 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=743000.0, ans=10.0 2023-11-19 17:40:29,532 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=743000.0, ans=0.2 2023-11-19 17:40:31,990 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=743000.0, ans=0.125 2023-11-19 17:40:37,281 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 3250, loss[loss=0.06206, simple_loss=0.07444, pruned_loss=0.01449, audio_tagging_loss=0.01035, over 15863.00 frames. ], tot_loss[loss=0.08696, simple_loss=0.1062, pruned_loss=0.02311, audio_tagging_loss=0.01076, over 3046784.44 frames. ], batch size: 61, lr: 7.02e-03, grad_scale: 32.0 2023-11-19 17:40:58,706 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.71 vs. limit=15.0 2023-11-19 17:41:25,807 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 111500 2023-11-19 17:41:29,718 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=743333.3333333334, ans=0.125 2023-11-19 17:41:40,576 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 3300, loss[loss=0.07154, simple_loss=0.09115, pruned_loss=0.01409, audio_tagging_loss=0.01186, over 14802.00 frames. ], tot_loss[loss=0.08567, simple_loss=0.1046, pruned_loss=0.02252, audio_tagging_loss=0.01087, over 3050895.84 frames. ], batch size: 55, lr: 7.02e-03, grad_scale: 16.0 2023-11-19 17:41:45,796 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.24 vs. limit=6.0 2023-11-19 17:41:53,598 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.87 vs. limit=10.0 2023-11-19 17:41:56,011 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.76 vs. limit=15.0 2023-11-19 17:42:00,056 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.519e+01 8.050e+01 8.952e+01 9.862e+01 1.284e+02, threshold=1.790e+02, percent-clipped=0.0 2023-11-19 17:42:09,474 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.08 vs. limit=15.0 2023-11-19 17:42:11,597 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=743533.3333333334, ans=0.125 2023-11-19 17:42:23,338 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=743600.0, ans=0.125 2023-11-19 17:42:29,509 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 111550 2023-11-19 17:42:32,838 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=743666.6666666666, ans=0.0 2023-11-19 17:42:45,500 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 3350, loss[loss=0.08274, simple_loss=0.09904, pruned_loss=0.02201, audio_tagging_loss=0.01121, over 13961.00 frames. ], tot_loss[loss=0.08556, simple_loss=0.1043, pruned_loss=0.02257, audio_tagging_loss=0.01085, over 3048961.09 frames. ], batch size: 53, lr: 7.02e-03, grad_scale: 16.0 2023-11-19 17:42:53,492 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=743733.3333333334, ans=0.125 2023-11-19 17:43:07,694 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=743800.0, ans=0.125 2023-11-19 17:43:10,275 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=743866.6666666666, ans=0.125 2023-11-19 17:43:34,589 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 111600 2023-11-19 17:43:38,409 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.94 vs. limit=22.5 2023-11-19 17:43:47,919 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=744000.0, ans=0.1 2023-11-19 17:43:50,271 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 3400, loss[loss=0.08138, simple_loss=0.09519, pruned_loss=0.02327, audio_tagging_loss=0.01052, over 13188.00 frames. ], tot_loss[loss=0.08614, simple_loss=0.1053, pruned_loss=0.02289, audio_tagging_loss=0.01061, over 3051162.85 frames. ], batch size: 50, lr: 7.01e-03, grad_scale: 16.0 2023-11-19 17:43:51,165 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.36 vs. limit=22.5 2023-11-19 17:43:51,898 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=744066.6666666666, ans=0.2 2023-11-19 17:44:09,432 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.227e+01 8.409e+01 9.014e+01 1.006e+02 1.399e+02, threshold=1.803e+02, percent-clipped=0.0 2023-11-19 17:44:11,461 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=744133.3333333334, ans=0.125 2023-11-19 17:44:14,826 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=744133.3333333334, ans=0.125 2023-11-19 17:44:24,536 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 17:44:27,232 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=744200.0, ans=0.125 2023-11-19 17:44:30,767 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=744266.6666666666, ans=0.0 2023-11-19 17:44:34,467 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=744266.6666666666, ans=0.0 2023-11-19 17:44:39,205 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 111650 2023-11-19 17:44:39,430 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=744266.6666666666, ans=0.0 2023-11-19 17:44:43,434 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.12 vs. limit=15.0 2023-11-19 17:44:54,675 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 3450, loss[loss=0.06253, simple_loss=0.07258, pruned_loss=0.01423, audio_tagging_loss=0.012, over 15367.00 frames. ], tot_loss[loss=0.08614, simple_loss=0.1054, pruned_loss=0.02292, audio_tagging_loss=0.01051, over 3052462.31 frames. ], batch size: 58, lr: 7.01e-03, grad_scale: 16.0 2023-11-19 17:45:10,989 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=744466.6666666666, ans=0.07 2023-11-19 17:45:12,326 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=744466.6666666666, ans=0.125 2023-11-19 17:45:27,189 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=744533.3333333334, ans=0.125 2023-11-19 17:45:43,784 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 111700 2023-11-19 17:45:47,639 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=744666.6666666666, ans=0.125 2023-11-19 17:45:48,293 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.77 vs. limit=15.0 2023-11-19 17:45:59,671 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 3500, loss[loss=0.1156, simple_loss=0.1472, pruned_loss=0.03692, audio_tagging_loss=0.005083, over 16389.00 frames. ], tot_loss[loss=0.08601, simple_loss=0.1052, pruned_loss=0.02297, audio_tagging_loss=0.01041, over 3048591.58 frames. ], batch size: 60, lr: 7.01e-03, grad_scale: 16.0 2023-11-19 17:45:59,953 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=744733.3333333334, ans=0.125 2023-11-19 17:46:06,123 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=744733.3333333334, ans=0.025 2023-11-19 17:46:06,156 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=744733.3333333334, ans=0.0 2023-11-19 17:46:10,225 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.93 vs. limit=15.0 2023-11-19 17:46:18,195 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.875e+01 8.242e+01 8.864e+01 9.843e+01 1.271e+02, threshold=1.773e+02, percent-clipped=0.0 2023-11-19 17:46:31,157 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 17:46:43,799 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=744933.3333333334, ans=0.07 2023-11-19 17:46:48,488 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 111750 2023-11-19 17:46:49,895 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=745000.0, ans=0.0 2023-11-19 17:46:53,423 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=745000.0, ans=0.125 2023-11-19 17:46:54,922 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=745000.0, ans=0.2 2023-11-19 17:46:56,417 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.12 vs. limit=6.0 2023-11-19 17:46:57,437 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=745000.0, ans=0.125 2023-11-19 17:46:59,936 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=745000.0, ans=0.0 2023-11-19 17:47:03,401 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 3550, loss[loss=0.05324, simple_loss=0.05527, pruned_loss=0.01261, audio_tagging_loss=0.01299, over 14303.00 frames. ], tot_loss[loss=0.08526, simple_loss=0.1042, pruned_loss=0.02275, audio_tagging_loss=0.01042, over 3042214.27 frames. ], batch size: 55, lr: 7.01e-03, grad_scale: 16.0 2023-11-19 17:47:19,472 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 17:47:43,700 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=745266.6666666666, ans=0.2 2023-11-19 17:47:52,459 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 111800 2023-11-19 17:47:57,034 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=745333.3333333334, ans=0.125 2023-11-19 17:47:59,499 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=745333.3333333334, ans=0.2 2023-11-19 17:48:05,695 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=745333.3333333334, ans=0.2 2023-11-19 17:48:07,844 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 3600, loss[loss=0.07133, simple_loss=0.07539, pruned_loss=0.0209, audio_tagging_loss=0.01274, over 15367.00 frames. ], tot_loss[loss=0.08467, simple_loss=0.1036, pruned_loss=0.02244, audio_tagging_loss=0.01044, over 3037934.68 frames. ], batch size: 61, lr: 7.01e-03, grad_scale: 32.0 2023-11-19 17:48:18,365 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.25 vs. limit=22.5 2023-11-19 17:48:28,209 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.581e+01 8.234e+01 9.119e+01 9.988e+01 1.352e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-19 17:48:28,484 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=745466.6666666666, ans=0.2 2023-11-19 17:48:35,058 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.86 vs. limit=15.0 2023-11-19 17:48:47,226 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=745600.0, ans=0.0 2023-11-19 17:48:56,736 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 111850 2023-11-19 17:49:03,948 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=745666.6666666666, ans=0.125 2023-11-19 17:49:13,506 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 3650, loss[loss=0.09207, simple_loss=0.1154, pruned_loss=0.02231, audio_tagging_loss=0.01209, over 15443.00 frames. ], tot_loss[loss=0.08551, simple_loss=0.1047, pruned_loss=0.02275, audio_tagging_loss=0.01039, over 3036529.55 frames. ], batch size: 59, lr: 7.01e-03, grad_scale: 32.0 2023-11-19 17:49:24,783 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=745800.0, ans=0.1 2023-11-19 17:49:28,454 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=745800.0, ans=0.125 2023-11-19 17:49:29,702 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=745800.0, ans=0.09899494936611666 2023-11-19 17:49:42,090 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=745866.6666666666, ans=0.125 2023-11-19 17:50:02,611 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 111900 2023-11-19 17:50:16,659 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=746066.6666666666, ans=0.125 2023-11-19 17:50:17,555 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 3700, loss[loss=0.1169, simple_loss=0.1524, pruned_loss=0.03343, audio_tagging_loss=0.007232, over 15719.00 frames. ], tot_loss[loss=0.08582, simple_loss=0.1051, pruned_loss=0.02291, audio_tagging_loss=0.01038, over 3045316.33 frames. ], batch size: 56, lr: 7.01e-03, grad_scale: 32.0 2023-11-19 17:50:20,370 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=746066.6666666666, ans=0.125 2023-11-19 17:50:28,758 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=746133.3333333334, ans=0.0 2023-11-19 17:50:35,746 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.943e+01 8.609e+01 9.395e+01 1.090e+02 1.567e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-19 17:51:05,381 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 111950 2023-11-19 17:51:10,689 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.78 vs. limit=22.5 2023-11-19 17:51:14,110 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=746333.3333333334, ans=0.0 2023-11-19 17:51:15,395 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=746333.3333333334, ans=0.1 2023-11-19 17:51:16,990 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.06 vs. limit=10.0 2023-11-19 17:51:20,089 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 3750, loss[loss=0.09752, simple_loss=0.1176, pruned_loss=0.02929, audio_tagging_loss=0.00943, over 15589.00 frames. ], tot_loss[loss=0.08623, simple_loss=0.1054, pruned_loss=0.02309, audio_tagging_loss=0.01042, over 3046982.55 frames. ], batch size: 58, lr: 7.00e-03, grad_scale: 32.0 2023-11-19 17:51:23,076 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=746400.0, ans=15.0 2023-11-19 17:51:25,348 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=746400.0, ans=0.2 2023-11-19 17:51:28,966 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=746400.0, ans=0.125 2023-11-19 17:51:44,463 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=746466.6666666666, ans=0.125 2023-11-19 17:51:49,989 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.20 vs. limit=15.0 2023-11-19 17:52:03,715 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 17:52:08,669 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 112000 2023-11-19 17:52:10,405 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-112000.pt 2023-11-19 17:52:28,345 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 3800, loss[loss=0.06081, simple_loss=0.07375, pruned_loss=0.01121, audio_tagging_loss=0.01273, over 15529.00 frames. ], tot_loss[loss=0.08679, simple_loss=0.1061, pruned_loss=0.02327, audio_tagging_loss=0.01047, over 3050063.16 frames. ], batch size: 60, lr: 7.00e-03, grad_scale: 32.0 2023-11-19 17:52:39,632 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=746800.0, ans=0.125 2023-11-19 17:52:46,639 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.263e+01 8.531e+01 9.323e+01 1.047e+02 1.478e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-19 17:52:48,196 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=746800.0, ans=0.125 2023-11-19 17:53:16,983 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 112050 2023-11-19 17:53:18,350 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=747000.0, ans=0.0 2023-11-19 17:53:24,366 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 17:53:26,802 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=747000.0, ans=0.1 2023-11-19 17:53:31,512 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 3850, loss[loss=0.08814, simple_loss=0.1091, pruned_loss=0.02173, audio_tagging_loss=0.01188, over 16068.00 frames. ], tot_loss[loss=0.08661, simple_loss=0.106, pruned_loss=0.0232, audio_tagging_loss=0.01039, over 3049141.16 frames. ], batch size: 61, lr: 7.00e-03, grad_scale: 32.0 2023-11-19 17:53:31,757 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=747066.6666666666, ans=0.125 2023-11-19 17:53:39,231 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=747066.6666666666, ans=0.1 2023-11-19 17:53:45,429 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=747133.3333333334, ans=0.125 2023-11-19 17:54:13,348 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=747266.6666666666, ans=0.1 2023-11-19 17:54:20,354 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 112100 2023-11-19 17:54:29,683 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.27 vs. limit=15.0 2023-11-19 17:54:33,535 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=15.76 vs. limit=15.0 2023-11-19 17:54:35,292 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 3900, loss[loss=0.1005, simple_loss=0.124, pruned_loss=0.02822, audio_tagging_loss=0.01029, over 15420.00 frames. ], tot_loss[loss=0.08685, simple_loss=0.1063, pruned_loss=0.02319, audio_tagging_loss=0.01053, over 3045096.91 frames. ], batch size: 56, lr: 7.00e-03, grad_scale: 32.0 2023-11-19 17:54:36,687 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=747400.0, ans=0.125 2023-11-19 17:54:42,968 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=747400.0, ans=0.09899494936611666 2023-11-19 17:54:52,829 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=747466.6666666666, ans=0.0 2023-11-19 17:54:55,654 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.123e+01 8.433e+01 9.481e+01 1.017e+02 1.565e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-19 17:55:03,418 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=747533.3333333334, ans=0.125 2023-11-19 17:55:15,837 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=747600.0, ans=0.1 2023-11-19 17:55:16,064 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=747600.0, ans=0.0 2023-11-19 17:55:24,489 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 112150 2023-11-19 17:55:37,583 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=747666.6666666666, ans=0.015 2023-11-19 17:55:40,524 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 3950, loss[loss=0.07345, simple_loss=0.07611, pruned_loss=0.01698, audio_tagging_loss=0.01842, over 14970.00 frames. ], tot_loss[loss=0.08627, simple_loss=0.1053, pruned_loss=0.02291, audio_tagging_loss=0.01069, over 3033655.62 frames. ], batch size: 57, lr: 7.00e-03, grad_scale: 32.0 2023-11-19 17:55:51,175 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=747733.3333333334, ans=0.1 2023-11-19 17:55:51,192 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=747733.3333333334, ans=0.125 2023-11-19 17:55:54,876 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=747800.0, ans=0.0 2023-11-19 17:56:09,904 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=747866.6666666666, ans=0.0 2023-11-19 17:56:29,197 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 112200 2023-11-19 17:56:30,908 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_na.min_abs, batch_count=748000.0, ans=0.02 2023-11-19 17:56:30,922 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=748000.0, ans=0.0 2023-11-19 17:56:44,796 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 4000, loss[loss=0.09493, simple_loss=0.1152, pruned_loss=0.02893, audio_tagging_loss=0.008406, over 15353.00 frames. ], tot_loss[loss=0.08682, simple_loss=0.1056, pruned_loss=0.02321, audio_tagging_loss=0.0108, over 3037460.19 frames. ], batch size: 56, lr: 7.00e-03, grad_scale: 32.0 2023-11-19 17:56:47,452 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=748066.6666666666, ans=0.125 2023-11-19 17:56:49,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=748066.6666666666, ans=0.125 2023-11-19 17:56:49,946 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=748066.6666666666, ans=0.0 2023-11-19 17:56:56,356 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=748133.3333333334, ans=0.125 2023-11-19 17:57:04,929 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.918e+01 8.458e+01 9.188e+01 1.037e+02 1.473e+02, threshold=1.838e+02, percent-clipped=0.0 2023-11-19 17:57:07,764 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=748133.3333333334, ans=0.125 2023-11-19 17:57:17,709 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=748200.0, ans=0.0 2023-11-19 17:57:28,188 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=748266.6666666666, ans=0.1 2023-11-19 17:57:30,562 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=748266.6666666666, ans=0.2 2023-11-19 17:57:34,112 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 112250 2023-11-19 17:57:34,916 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.30 vs. limit=15.0 2023-11-19 17:57:37,947 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=748333.3333333334, ans=0.0 2023-11-19 17:57:40,351 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=748333.3333333334, ans=0.125 2023-11-19 17:57:48,610 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 4050, loss[loss=0.1121, simple_loss=0.1479, pruned_loss=0.03282, audio_tagging_loss=0.005301, over 16042.00 frames. ], tot_loss[loss=0.08747, simple_loss=0.1064, pruned_loss=0.02362, audio_tagging_loss=0.01065, over 3034601.29 frames. ], batch size: 57, lr: 6.99e-03, grad_scale: 32.0 2023-11-19 17:57:51,087 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 17:57:51,331 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=748400.0, ans=0.125 2023-11-19 17:58:14,032 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.63 vs. limit=15.0 2023-11-19 17:58:15,139 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.18 vs. limit=22.5 2023-11-19 17:58:31,529 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=748600.0, ans=0.2 2023-11-19 17:58:37,385 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 112300 2023-11-19 17:58:39,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=748666.6666666666, ans=0.125 2023-11-19 17:58:52,496 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 4100, loss[loss=0.06045, simple_loss=0.06806, pruned_loss=0.01341, audio_tagging_loss=0.01301, over 15484.00 frames. ], tot_loss[loss=0.08751, simple_loss=0.1069, pruned_loss=0.02348, audio_tagging_loss=0.01059, over 3036350.04 frames. ], batch size: 60, lr: 6.99e-03, grad_scale: 32.0 2023-11-19 17:58:55,883 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=748733.3333333334, ans=0.025 2023-11-19 17:59:13,894 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.713e+01 8.246e+01 9.038e+01 9.964e+01 1.289e+02, threshold=1.808e+02, percent-clipped=0.0 2023-11-19 17:59:27,213 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.57 vs. limit=15.0 2023-11-19 17:59:33,162 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=748933.3333333334, ans=0.1 2023-11-19 17:59:35,497 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=748933.3333333334, ans=0.125 2023-11-19 17:59:42,019 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 112350 2023-11-19 17:59:56,542 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=749066.6666666666, ans=0.0 2023-11-19 17:59:57,407 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 4150, loss[loss=0.08181, simple_loss=0.1035, pruned_loss=0.0198, audio_tagging_loss=0.01025, over 15539.00 frames. ], tot_loss[loss=0.08743, simple_loss=0.1069, pruned_loss=0.02352, audio_tagging_loss=0.01046, over 3040110.22 frames. ], batch size: 57, lr: 6.99e-03, grad_scale: 32.0 2023-11-19 17:59:59,661 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=749066.6666666666, ans=0.02 2023-11-19 18:00:12,231 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=749133.3333333334, ans=0.125 2023-11-19 18:00:21,922 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=749200.0, ans=0.125 2023-11-19 18:00:44,012 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 18:00:46,537 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 112400 2023-11-19 18:01:01,974 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 4200, loss[loss=0.08172, simple_loss=0.1112, pruned_loss=0.01646, audio_tagging_loss=0.009641, over 15499.00 frames. ], tot_loss[loss=0.08705, simple_loss=0.1069, pruned_loss=0.02335, audio_tagging_loss=0.01025, over 3038966.97 frames. ], batch size: 55, lr: 6.99e-03, grad_scale: 32.0 2023-11-19 18:01:16,003 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=749466.6666666666, ans=0.0 2023-11-19 18:01:23,168 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.124e+01 8.470e+01 8.967e+01 9.932e+01 1.345e+02, threshold=1.793e+02, percent-clipped=0.0 2023-11-19 18:01:43,742 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 18:01:50,889 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 112450 2023-11-19 18:02:05,493 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 4250, loss[loss=0.08704, simple_loss=0.1067, pruned_loss=0.02326, audio_tagging_loss=0.01043, over 15157.00 frames. ], tot_loss[loss=0.08788, simple_loss=0.1084, pruned_loss=0.02347, audio_tagging_loss=0.01019, over 3051873.66 frames. ], batch size: 56, lr: 6.99e-03, grad_scale: 16.0 2023-11-19 18:02:23,852 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.34 vs. limit=10.0 2023-11-19 18:02:28,083 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=749800.0, ans=0.125 2023-11-19 18:02:34,055 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=749866.6666666666, ans=6.0 2023-11-19 18:02:54,485 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 112500 2023-11-19 18:02:55,956 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=750000.0, ans=0.125 2023-11-19 18:03:05,907 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=750000.0, ans=0.125 2023-11-19 18:03:10,400 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 4300, loss[loss=0.08366, simple_loss=0.1049, pruned_loss=0.02093, audio_tagging_loss=0.01027, over 15087.00 frames. ], tot_loss[loss=0.08764, simple_loss=0.1082, pruned_loss=0.02337, audio_tagging_loss=0.01017, over 3053504.07 frames. ], batch size: 58, lr: 6.99e-03, grad_scale: 16.0 2023-11-19 18:03:30,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=750133.3333333334, ans=0.07 2023-11-19 18:03:31,045 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=750133.3333333334, ans=0.0 2023-11-19 18:03:31,891 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.174e+01 8.838e+01 9.432e+01 1.009e+02 1.921e+02, threshold=1.886e+02, percent-clipped=1.0 2023-11-19 18:03:35,874 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=750200.0, ans=0.125 2023-11-19 18:03:38,323 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=750200.0, ans=0.125 2023-11-19 18:03:56,916 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 18:03:59,307 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 112550 2023-11-19 18:04:02,071 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=750333.3333333334, ans=0.125 2023-11-19 18:04:14,897 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 4350, loss[loss=0.1195, simple_loss=0.1546, pruned_loss=0.03321, audio_tagging_loss=0.009047, over 16517.00 frames. ], tot_loss[loss=0.08719, simple_loss=0.1076, pruned_loss=0.02318, audio_tagging_loss=0.01023, over 3052710.13 frames. ], batch size: 57, lr: 6.99e-03, grad_scale: 16.0 2023-11-19 18:04:28,781 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=750466.6666666666, ans=0.05 2023-11-19 18:04:41,925 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=750533.3333333334, ans=0.0 2023-11-19 18:04:43,519 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=750533.3333333334, ans=0.0 2023-11-19 18:05:03,477 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 112600 2023-11-19 18:05:15,309 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=750666.6666666666, ans=0.2 2023-11-19 18:05:18,575 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 4400, loss[loss=0.08481, simple_loss=0.1048, pruned_loss=0.02564, audio_tagging_loss=0.006767, over 14951.00 frames. ], tot_loss[loss=0.08657, simple_loss=0.1068, pruned_loss=0.02299, audio_tagging_loss=0.01017, over 3052154.45 frames. ], batch size: 54, lr: 6.98e-03, grad_scale: 32.0 2023-11-19 18:05:18,974 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=750733.3333333334, ans=0.0 2023-11-19 18:05:31,323 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=750800.0, ans=0.0 2023-11-19 18:05:32,545 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=750800.0, ans=0.125 2023-11-19 18:05:40,676 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.038e+01 8.205e+01 8.716e+01 9.862e+01 1.282e+02, threshold=1.743e+02, percent-clipped=0.0 2023-11-19 18:05:47,102 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=750866.6666666666, ans=0.125 2023-11-19 18:05:50,719 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=750866.6666666666, ans=0.1 2023-11-19 18:06:01,491 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=750933.3333333334, ans=0.05 2023-11-19 18:06:03,758 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=750933.3333333334, ans=0.1 2023-11-19 18:06:06,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=750933.3333333334, ans=0.95 2023-11-19 18:06:07,255 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 112650 2023-11-19 18:06:23,235 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 4450, loss[loss=0.09823, simple_loss=0.123, pruned_loss=0.02516, audio_tagging_loss=0.01156, over 16301.00 frames. ], tot_loss[loss=0.08671, simple_loss=0.1067, pruned_loss=0.02318, audio_tagging_loss=0.01018, over 3048600.96 frames. ], batch size: 58, lr: 6.98e-03, grad_scale: 32.0 2023-11-19 18:06:29,655 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=751066.6666666666, ans=0.125 2023-11-19 18:06:41,022 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.21 vs. limit=22.5 2023-11-19 18:06:53,647 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=751200.0, ans=0.125 2023-11-19 18:07:11,171 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=751266.6666666666, ans=0.0 2023-11-19 18:07:12,218 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 112700 2023-11-19 18:07:18,625 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=751333.3333333334, ans=0.125 2023-11-19 18:07:26,893 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 4500, loss[loss=0.06772, simple_loss=0.08043, pruned_loss=0.0177, audio_tagging_loss=0.009808, over 15215.00 frames. ], tot_loss[loss=0.08636, simple_loss=0.106, pruned_loss=0.0231, audio_tagging_loss=0.01026, over 3054131.60 frames. ], batch size: 58, lr: 6.98e-03, grad_scale: 32.0 2023-11-19 18:07:32,929 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=751400.0, ans=0.2 2023-11-19 18:07:37,930 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.55 vs. limit=15.0 2023-11-19 18:07:39,084 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=751466.6666666666, ans=0.125 2023-11-19 18:07:46,983 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.76 vs. limit=6.0 2023-11-19 18:07:48,351 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.958e+01 8.242e+01 8.889e+01 9.724e+01 1.502e+02, threshold=1.778e+02, percent-clipped=0.0 2023-11-19 18:07:50,358 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.68 vs. limit=15.0 2023-11-19 18:08:15,522 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 112750 2023-11-19 18:08:28,867 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=751666.6666666666, ans=0.1 2023-11-19 18:08:30,952 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 4550, loss[loss=0.06815, simple_loss=0.08289, pruned_loss=0.0167, audio_tagging_loss=0.01001, over 14421.00 frames. ], tot_loss[loss=0.08566, simple_loss=0.1049, pruned_loss=0.02286, audio_tagging_loss=0.01036, over 3049314.57 frames. ], batch size: 56, lr: 6.98e-03, grad_scale: 32.0 2023-11-19 18:08:34,782 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=751733.3333333334, ans=0.1 2023-11-19 18:08:43,644 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=751800.0, ans=0.0 2023-11-19 18:08:54,733 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=751800.0, ans=0.125 2023-11-19 18:08:56,181 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=751866.6666666666, ans=0.125 2023-11-19 18:09:16,377 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.whiten.whitening_limit, batch_count=751933.3333333334, ans=12.0 2023-11-19 18:09:19,664 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 18:09:19,771 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 112800 2023-11-19 18:09:36,090 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 4600, loss[loss=0.07565, simple_loss=0.09264, pruned_loss=0.01849, audio_tagging_loss=0.01084, over 15017.00 frames. ], tot_loss[loss=0.0857, simple_loss=0.1051, pruned_loss=0.02278, audio_tagging_loss=0.01034, over 3059436.28 frames. ], batch size: 56, lr: 6.98e-03, grad_scale: 32.0 2023-11-19 18:09:53,504 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=752133.3333333334, ans=0.0 2023-11-19 18:09:56,975 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.492e+01 8.205e+01 8.855e+01 9.599e+01 1.553e+02, threshold=1.771e+02, percent-clipped=0.0 2023-11-19 18:10:24,866 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 112850 2023-11-19 18:10:29,876 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=752333.3333333334, ans=0.05 2023-11-19 18:10:31,592 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.88 vs. limit=6.0 2023-11-19 18:10:39,532 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 4650, loss[loss=0.08687, simple_loss=0.1103, pruned_loss=0.02425, audio_tagging_loss=0.007471, over 14801.00 frames. ], tot_loss[loss=0.08612, simple_loss=0.1055, pruned_loss=0.02295, audio_tagging_loss=0.01041, over 3061581.91 frames. ], batch size: 55, lr: 6.98e-03, grad_scale: 16.0 2023-11-19 18:10:41,158 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=752400.0, ans=0.0 2023-11-19 18:11:00,246 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=752466.6666666666, ans=0.0 2023-11-19 18:11:05,877 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=752533.3333333334, ans=0.0 2023-11-19 18:11:07,787 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=752533.3333333334, ans=0.2 2023-11-19 18:11:16,710 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=752533.3333333334, ans=0.125 2023-11-19 18:11:28,745 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 112900 2023-11-19 18:11:43,078 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 4700, loss[loss=0.08886, simple_loss=0.1139, pruned_loss=0.02339, audio_tagging_loss=0.008497, over 16020.00 frames. ], tot_loss[loss=0.0863, simple_loss=0.1056, pruned_loss=0.02295, audio_tagging_loss=0.01055, over 3063311.97 frames. ], batch size: 62, lr: 6.97e-03, grad_scale: 16.0 2023-11-19 18:12:06,754 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.467e+01 8.509e+01 9.287e+01 1.024e+02 1.353e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-19 18:12:30,023 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.59 vs. limit=15.0 2023-11-19 18:12:31,881 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 112950 2023-11-19 18:12:38,971 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=753000.0, ans=0.0 2023-11-19 18:12:40,027 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=753000.0, ans=0.5 2023-11-19 18:12:40,517 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.69 vs. limit=15.0 2023-11-19 18:12:41,714 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.13 vs. limit=15.0 2023-11-19 18:12:49,181 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 4750, loss[loss=0.04951, simple_loss=0.04907, pruned_loss=0.009542, audio_tagging_loss=0.01543, over 13534.00 frames. ], tot_loss[loss=0.08641, simple_loss=0.1056, pruned_loss=0.02313, audio_tagging_loss=0.0105, over 3053543.14 frames. ], batch size: 53, lr: 6.97e-03, grad_scale: 16.0 2023-11-19 18:12:49,645 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=753066.6666666666, ans=0.125 2023-11-19 18:13:23,757 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=753200.0, ans=0.1 2023-11-19 18:13:37,929 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 113000 2023-11-19 18:13:42,552 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.80 vs. limit=15.0 2023-11-19 18:13:44,579 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=753333.3333333334, ans=0.125 2023-11-19 18:13:53,014 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 4800, loss[loss=0.1048, simple_loss=0.1266, pruned_loss=0.03126, audio_tagging_loss=0.01023, over 15324.00 frames. ], tot_loss[loss=0.08604, simple_loss=0.1048, pruned_loss=0.02293, audio_tagging_loss=0.0107, over 3052702.88 frames. ], batch size: 56, lr: 6.97e-03, grad_scale: 32.0 2023-11-19 18:14:05,347 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.71 vs. limit=15.0 2023-11-19 18:14:11,266 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.23 vs. limit=15.0 2023-11-19 18:14:15,876 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.124e+01 8.505e+01 9.415e+01 1.014e+02 1.501e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-19 18:14:39,274 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=753600.0, ans=0.125 2023-11-19 18:14:41,539 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 113050 2023-11-19 18:14:46,649 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=753666.6666666666, ans=0.0 2023-11-19 18:14:55,998 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 4850, loss[loss=0.07044, simple_loss=0.08663, pruned_loss=0.01764, audio_tagging_loss=0.009477, over 14959.00 frames. ], tot_loss[loss=0.08585, simple_loss=0.1046, pruned_loss=0.02278, audio_tagging_loss=0.01077, over 3059309.72 frames. ], batch size: 60, lr: 6.97e-03, grad_scale: 32.0 2023-11-19 18:15:00,355 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.13 vs. limit=15.0 2023-11-19 18:15:16,656 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=753800.0, ans=0.0 2023-11-19 18:15:32,747 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=753866.6666666666, ans=0.2 2023-11-19 18:15:44,943 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 113100 2023-11-19 18:15:53,032 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.76 vs. limit=15.0 2023-11-19 18:16:01,388 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 4900, loss[loss=0.08561, simple_loss=0.1061, pruned_loss=0.02124, audio_tagging_loss=0.01131, over 14576.00 frames. ], tot_loss[loss=0.08612, simple_loss=0.1049, pruned_loss=0.02301, audio_tagging_loss=0.01068, over 3054264.37 frames. ], batch size: 55, lr: 6.97e-03, grad_scale: 32.0 2023-11-19 18:16:23,316 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.624e+01 8.196e+01 8.687e+01 9.230e+01 1.120e+02, threshold=1.737e+02, percent-clipped=0.0 2023-11-19 18:16:25,244 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.90 vs. limit=15.0 2023-11-19 18:16:31,010 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=754200.0, ans=0.125 2023-11-19 18:16:33,801 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=754200.0, ans=0.125 2023-11-19 18:16:37,208 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=754266.6666666666, ans=0.1 2023-11-19 18:16:43,422 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=754266.6666666666, ans=0.125 2023-11-19 18:16:49,905 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 113150 2023-11-19 18:16:55,016 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=754333.3333333334, ans=0.125 2023-11-19 18:17:04,393 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 4950, loss[loss=0.1012, simple_loss=0.1263, pruned_loss=0.02907, audio_tagging_loss=0.008992, over 15052.00 frames. ], tot_loss[loss=0.08608, simple_loss=0.1051, pruned_loss=0.02302, audio_tagging_loss=0.01052, over 3053087.54 frames. ], batch size: 56, lr: 6.97e-03, grad_scale: 32.0 2023-11-19 18:17:16,913 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=754466.6666666666, ans=0.0 2023-11-19 18:17:25,951 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.54 vs. limit=10.0 2023-11-19 18:17:36,379 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=754533.3333333334, ans=0.015 2023-11-19 18:17:36,821 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=15.32 vs. limit=15.0 2023-11-19 18:17:52,791 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 113200 2023-11-19 18:17:52,954 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=754600.0, ans=0.07 2023-11-19 18:17:59,675 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=754666.6666666666, ans=0.125 2023-11-19 18:18:01,169 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.43 vs. limit=15.0 2023-11-19 18:18:03,056 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=754666.6666666666, ans=0.2 2023-11-19 18:18:07,780 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 5000, loss[loss=0.06539, simple_loss=0.08155, pruned_loss=0.01381, audio_tagging_loss=0.0108, over 14913.00 frames. ], tot_loss[loss=0.08634, simple_loss=0.1058, pruned_loss=0.02315, audio_tagging_loss=0.01027, over 3051133.78 frames. ], batch size: 58, lr: 6.97e-03, grad_scale: 32.0 2023-11-19 18:18:14,040 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=754733.3333333334, ans=0.1 2023-11-19 18:18:31,435 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.773e+01 8.065e+01 8.852e+01 9.668e+01 1.212e+02, threshold=1.770e+02, percent-clipped=0.0 2023-11-19 18:18:52,434 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=754933.3333333334, ans=0.05 2023-11-19 18:18:55,837 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 113250 2023-11-19 18:19:11,139 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 5050, loss[loss=0.08195, simple_loss=0.1078, pruned_loss=0.02094, audio_tagging_loss=0.007101, over 14910.00 frames. ], tot_loss[loss=0.08553, simple_loss=0.1049, pruned_loss=0.02285, audio_tagging_loss=0.01021, over 3047348.36 frames. ], batch size: 55, lr: 6.96e-03, grad_scale: 32.0 2023-11-19 18:19:27,336 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.94 vs. limit=15.0 2023-11-19 18:19:47,631 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 18:19:48,677 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=755266.6666666666, ans=0.125 2023-11-19 18:19:59,441 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 113300 2023-11-19 18:20:04,494 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=755333.3333333334, ans=0.04949747468305833 2023-11-19 18:20:12,603 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=755333.3333333334, ans=0.1 2023-11-19 18:20:15,884 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 5100, loss[loss=0.05976, simple_loss=0.07459, pruned_loss=0.01467, audio_tagging_loss=0.007794, over 15301.00 frames. ], tot_loss[loss=0.08529, simple_loss=0.1046, pruned_loss=0.0228, audio_tagging_loss=0.01017, over 3049002.25 frames. ], batch size: 59, lr: 6.96e-03, grad_scale: 32.0 2023-11-19 18:20:37,293 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=755466.6666666666, ans=0.0 2023-11-19 18:20:38,099 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.347e+01 7.917e+01 8.813e+01 9.876e+01 1.323e+02, threshold=1.763e+02, percent-clipped=0.0 2023-11-19 18:20:46,396 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.61 vs. limit=15.0 2023-11-19 18:21:04,726 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 113350 2023-11-19 18:21:07,218 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=755666.6666666666, ans=0.125 2023-11-19 18:21:19,285 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 5150, loss[loss=0.06732, simple_loss=0.07981, pruned_loss=0.01736, audio_tagging_loss=0.01006, over 15331.00 frames. ], tot_loss[loss=0.08529, simple_loss=0.1047, pruned_loss=0.02275, audio_tagging_loss=0.0102, over 3054755.74 frames. ], batch size: 58, lr: 6.96e-03, grad_scale: 32.0 2023-11-19 18:21:20,185 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.out_whiten.whitening_limit, batch_count=755733.3333333334, ans=8.0 2023-11-19 18:21:34,422 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=755800.0, ans=0.125 2023-11-19 18:21:48,429 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=755866.6666666666, ans=0.0 2023-11-19 18:21:48,999 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.11 vs. limit=22.5 2023-11-19 18:21:49,671 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 18:21:58,089 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=755933.3333333334, ans=0.125 2023-11-19 18:22:07,689 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 113400 2023-11-19 18:22:07,994 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=755933.3333333334, ans=0.125 2023-11-19 18:22:12,034 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=756000.0, ans=0.1 2023-11-19 18:22:12,411 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.13 vs. limit=15.0 2023-11-19 18:22:14,522 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=756000.0, ans=0.125 2023-11-19 18:22:19,618 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=756000.0, ans=0.125 2023-11-19 18:22:22,826 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 5200, loss[loss=0.06599, simple_loss=0.09068, pruned_loss=0.01138, audio_tagging_loss=0.009271, over 14535.00 frames. ], tot_loss[loss=0.08532, simple_loss=0.1048, pruned_loss=0.02272, audio_tagging_loss=0.01021, over 3054945.78 frames. ], batch size: 53, lr: 6.96e-03, grad_scale: 32.0 2023-11-19 18:22:25,451 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=756066.6666666666, ans=0.125 2023-11-19 18:22:43,296 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=756133.3333333334, ans=0.1 2023-11-19 18:22:46,629 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.196e+01 8.495e+01 9.298e+01 1.017e+02 1.203e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-19 18:22:54,796 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=756200.0, ans=15.0 2023-11-19 18:23:01,646 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=756266.6666666666, ans=0.0 2023-11-19 18:23:11,160 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 113450 2023-11-19 18:23:13,589 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 18:23:21,050 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=756333.3333333334, ans=0.125 2023-11-19 18:23:22,011 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=756333.3333333334, ans=0.0 2023-11-19 18:23:23,539 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.57 vs. limit=15.0 2023-11-19 18:23:27,289 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 5250, loss[loss=0.1189, simple_loss=0.1299, pruned_loss=0.04201, audio_tagging_loss=0.01198, over 15381.00 frames. ], tot_loss[loss=0.08668, simple_loss=0.1062, pruned_loss=0.02333, audio_tagging_loss=0.01023, over 3046075.85 frames. ], batch size: 56, lr: 6.96e-03, grad_scale: 32.0 2023-11-19 18:23:30,007 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=756400.0, ans=0.125 2023-11-19 18:23:32,294 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=756400.0, ans=0.0 2023-11-19 18:24:14,550 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 113500 2023-11-19 18:24:16,695 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=756666.6666666666, ans=0.5 2023-11-19 18:24:29,877 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 5300, loss[loss=0.1034, simple_loss=0.1279, pruned_loss=0.03017, audio_tagging_loss=0.009309, over 15000.00 frames. ], tot_loss[loss=0.08592, simple_loss=0.1058, pruned_loss=0.02293, audio_tagging_loss=0.0101, over 3044524.01 frames. ], batch size: 54, lr: 6.96e-03, grad_scale: 32.0 2023-11-19 18:24:46,629 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.99 vs. limit=15.0 2023-11-19 18:24:53,118 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.177e+01 8.322e+01 9.046e+01 9.978e+01 1.366e+02, threshold=1.809e+02, percent-clipped=0.0 2023-11-19 18:25:12,189 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=756933.3333333334, ans=0.0 2023-11-19 18:25:19,299 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 113550 2023-11-19 18:25:26,738 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=757000.0, ans=0.2 2023-11-19 18:25:29,882 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=15.15 vs. limit=15.0 2023-11-19 18:25:30,396 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=757000.0, ans=0.0 2023-11-19 18:25:33,022 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=757066.6666666666, ans=0.2 2023-11-19 18:25:33,895 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 5350, loss[loss=0.07072, simple_loss=0.08345, pruned_loss=0.01493, audio_tagging_loss=0.01406, over 15769.00 frames. ], tot_loss[loss=0.08578, simple_loss=0.1056, pruned_loss=0.02282, audio_tagging_loss=0.01014, over 3043144.30 frames. ], batch size: 61, lr: 6.95e-03, grad_scale: 32.0 2023-11-19 18:25:35,393 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=757066.6666666666, ans=0.125 2023-11-19 18:25:48,533 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.63 vs. limit=6.0 2023-11-19 18:26:22,816 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 113600 2023-11-19 18:26:25,881 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=757333.3333333334, ans=0.0 2023-11-19 18:26:39,012 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 5400, loss[loss=0.1078, simple_loss=0.1384, pruned_loss=0.03002, audio_tagging_loss=0.008581, over 14435.00 frames. ], tot_loss[loss=0.08541, simple_loss=0.105, pruned_loss=0.02262, audio_tagging_loss=0.01029, over 3046697.16 frames. ], batch size: 53, lr: 6.95e-03, grad_scale: 32.0 2023-11-19 18:26:45,583 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.24 vs. limit=12.0 2023-11-19 18:26:57,725 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=757466.6666666666, ans=0.125 2023-11-19 18:27:02,060 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.928e+01 8.152e+01 8.655e+01 9.837e+01 1.259e+02, threshold=1.731e+02, percent-clipped=0.0 2023-11-19 18:27:14,735 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=757533.3333333334, ans=0.125 2023-11-19 18:27:21,519 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 18:27:28,012 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 113650 2023-11-19 18:27:28,102 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=757600.0, ans=0.1 2023-11-19 18:27:41,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=757666.6666666666, ans=0.125 2023-11-19 18:27:43,283 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 5450, loss[loss=0.06605, simple_loss=0.08255, pruned_loss=0.01447, audio_tagging_loss=0.0103, over 15434.00 frames. ], tot_loss[loss=0.08565, simple_loss=0.1052, pruned_loss=0.02273, audio_tagging_loss=0.01033, over 3044459.65 frames. ], batch size: 59, lr: 6.95e-03, grad_scale: 32.0 2023-11-19 18:27:46,084 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=757733.3333333334, ans=0.2 2023-11-19 18:27:48,499 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=757733.3333333334, ans=0.0 2023-11-19 18:27:50,838 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=757733.3333333334, ans=0.0 2023-11-19 18:27:55,860 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=757800.0, ans=0.125 2023-11-19 18:28:20,774 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=757933.3333333334, ans=0.2 2023-11-19 18:28:28,996 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.83 vs. limit=15.0 2023-11-19 18:28:32,022 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 113700 2023-11-19 18:28:46,427 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 5500, loss[loss=0.07512, simple_loss=0.08856, pruned_loss=0.01924, audio_tagging_loss=0.0116, over 16136.00 frames. ], tot_loss[loss=0.08602, simple_loss=0.1054, pruned_loss=0.0229, audio_tagging_loss=0.01045, over 3052867.38 frames. ], batch size: 59, lr: 6.95e-03, grad_scale: 16.0 2023-11-19 18:28:49,086 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=758066.6666666666, ans=0.0 2023-11-19 18:28:50,256 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=758066.6666666666, ans=0.125 2023-11-19 18:29:01,561 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=758133.3333333334, ans=0.125 2023-11-19 18:29:01,728 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=758133.3333333334, ans=0.0 2023-11-19 18:29:10,973 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.752e+01 8.267e+01 8.902e+01 9.734e+01 1.914e+02, threshold=1.780e+02, percent-clipped=1.0 2023-11-19 18:29:11,276 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=758200.0, ans=0.125 2023-11-19 18:29:13,878 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=758200.0, ans=0.125 2023-11-19 18:29:34,942 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 113750 2023-11-19 18:29:39,925 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=758333.3333333334, ans=0.2 2023-11-19 18:29:47,415 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=758333.3333333334, ans=0.2 2023-11-19 18:29:48,647 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=758333.3333333334, ans=0.5 2023-11-19 18:29:51,188 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 5550, loss[loss=0.1058, simple_loss=0.1264, pruned_loss=0.03364, audio_tagging_loss=0.00898, over 15005.00 frames. ], tot_loss[loss=0.08584, simple_loss=0.1049, pruned_loss=0.02279, audio_tagging_loss=0.01059, over 3053990.80 frames. ], batch size: 55, lr: 6.95e-03, grad_scale: 16.0 2023-11-19 18:29:55,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=758400.0, ans=0.125 2023-11-19 18:29:55,064 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=758400.0, ans=0.0 2023-11-19 18:29:56,169 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=758400.0, ans=0.2 2023-11-19 18:30:01,075 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=758400.0, ans=0.125 2023-11-19 18:30:02,633 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.82 vs. limit=22.5 2023-11-19 18:30:03,557 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=758466.6666666666, ans=0.0 2023-11-19 18:30:19,748 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=758533.3333333334, ans=0.2 2023-11-19 18:30:35,334 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=758600.0, ans=0.07 2023-11-19 18:30:39,426 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 113800 2023-11-19 18:30:39,638 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=758600.0, ans=0.0 2023-11-19 18:30:39,742 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=758600.0, ans=0.125 2023-11-19 18:30:53,782 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten.whitening_limit, batch_count=758733.3333333334, ans=15.0 2023-11-19 18:30:54,389 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 5600, loss[loss=0.09937, simple_loss=0.1266, pruned_loss=0.02644, audio_tagging_loss=0.009646, over 14057.00 frames. ], tot_loss[loss=0.08537, simple_loss=0.1042, pruned_loss=0.02256, audio_tagging_loss=0.01071, over 3052911.68 frames. ], batch size: 53, lr: 6.95e-03, grad_scale: 32.0 2023-11-19 18:30:54,788 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=758733.3333333334, ans=0.2 2023-11-19 18:31:15,243 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=758800.0, ans=0.1 2023-11-19 18:31:18,671 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.413e+01 8.485e+01 9.378e+01 1.023e+02 2.129e+02, threshold=1.876e+02, percent-clipped=1.0 2023-11-19 18:31:23,760 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=758866.6666666666, ans=0.125 2023-11-19 18:31:39,018 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 18:31:39,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=758933.3333333334, ans=0.125 2023-11-19 18:31:43,975 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 113850 2023-11-19 18:31:59,338 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 5650, loss[loss=0.08868, simple_loss=0.1115, pruned_loss=0.02301, audio_tagging_loss=0.0099, over 15750.00 frames. ], tot_loss[loss=0.08585, simple_loss=0.105, pruned_loss=0.02268, audio_tagging_loss=0.01066, over 3057454.79 frames. ], batch size: 56, lr: 6.95e-03, grad_scale: 32.0 2023-11-19 18:32:01,025 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=759066.6666666666, ans=0.125 2023-11-19 18:32:06,848 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=759066.6666666666, ans=0.125 2023-11-19 18:32:07,311 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.48 vs. limit=22.5 2023-11-19 18:32:15,711 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=759133.3333333334, ans=0.125 2023-11-19 18:32:29,668 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=759200.0, ans=0.0 2023-11-19 18:32:39,918 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=759266.6666666666, ans=0.125 2023-11-19 18:32:48,353 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 113900 2023-11-19 18:32:53,460 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=759333.3333333334, ans=0.2 2023-11-19 18:33:04,058 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 5700, loss[loss=0.08488, simple_loss=0.0995, pruned_loss=0.02162, audio_tagging_loss=0.01351, over 15790.00 frames. ], tot_loss[loss=0.08508, simple_loss=0.1038, pruned_loss=0.02247, audio_tagging_loss=0.0107, over 3055667.42 frames. ], batch size: 58, lr: 6.94e-03, grad_scale: 16.0 2023-11-19 18:33:14,127 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn1.whiten.whitening_limit, batch_count=759400.0, ans=22.5 2023-11-19 18:33:27,096 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=759466.6666666666, ans=0.1 2023-11-19 18:33:29,971 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.506e+01 8.509e+01 9.324e+01 1.031e+02 1.317e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-19 18:33:37,789 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=759533.3333333334, ans=0.0 2023-11-19 18:33:53,316 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 113950 2023-11-19 18:33:59,130 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=759666.6666666666, ans=0.2 2023-11-19 18:34:08,411 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 5750, loss[loss=0.08184, simple_loss=0.1075, pruned_loss=0.02057, audio_tagging_loss=0.007508, over 15871.00 frames. ], tot_loss[loss=0.08492, simple_loss=0.1038, pruned_loss=0.02253, audio_tagging_loss=0.01049, over 3060073.30 frames. ], batch size: 58, lr: 6.94e-03, grad_scale: 16.0 2023-11-19 18:34:13,761 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=759733.3333333334, ans=0.125 2023-11-19 18:34:21,790 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=759800.0, ans=0.1 2023-11-19 18:34:22,112 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.65 vs. limit=15.0 2023-11-19 18:34:30,449 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=759800.0, ans=0.125 2023-11-19 18:34:46,721 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=759933.3333333334, ans=0.125 2023-11-19 18:34:57,385 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 114000 2023-11-19 18:35:12,682 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 5800, loss[loss=0.09612, simple_loss=0.1258, pruned_loss=0.02546, audio_tagging_loss=0.007748, over 14216.00 frames. ], tot_loss[loss=0.08551, simple_loss=0.1048, pruned_loss=0.02273, audio_tagging_loss=0.01036, over 3054838.82 frames. ], batch size: 52, lr: 6.94e-03, grad_scale: 16.0 2023-11-19 18:35:21,790 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.25 vs. limit=15.0 2023-11-19 18:35:21,915 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.59 vs. limit=22.5 2023-11-19 18:35:30,217 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=760133.3333333334, ans=0.2 2023-11-19 18:35:38,825 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.740e+01 8.284e+01 9.012e+01 9.674e+01 1.297e+02, threshold=1.802e+02, percent-clipped=0.0 2023-11-19 18:35:51,659 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=760266.6666666666, ans=0.125 2023-11-19 18:35:51,890 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.04 vs. limit=6.0 2023-11-19 18:35:52,997 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=760266.6666666666, ans=0.1 2023-11-19 18:35:57,921 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=760266.6666666666, ans=0.125 2023-11-19 18:36:01,854 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.46 vs. limit=15.0 2023-11-19 18:36:02,356 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 114050 2023-11-19 18:36:17,844 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 5850, loss[loss=0.08889, simple_loss=0.1072, pruned_loss=0.02596, audio_tagging_loss=0.009307, over 15356.00 frames. ], tot_loss[loss=0.08531, simple_loss=0.1045, pruned_loss=0.02277, audio_tagging_loss=0.01031, over 3051454.54 frames. ], batch size: 59, lr: 6.94e-03, grad_scale: 16.0 2023-11-19 18:36:23,563 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=760400.0, ans=0.0 2023-11-19 18:36:24,208 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.44 vs. limit=12.0 2023-11-19 18:36:33,367 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=760466.6666666666, ans=0.0 2023-11-19 18:37:07,071 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 114100 2023-11-19 18:37:09,619 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=760666.6666666666, ans=0.125 2023-11-19 18:37:13,963 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=760666.6666666666, ans=0.125 2023-11-19 18:37:22,316 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 5900, loss[loss=0.07573, simple_loss=0.08637, pruned_loss=0.01952, audio_tagging_loss=0.01303, over 14174.00 frames. ], tot_loss[loss=0.0856, simple_loss=0.105, pruned_loss=0.02287, audio_tagging_loss=0.01022, over 3051548.02 frames. ], batch size: 53, lr: 6.94e-03, grad_scale: 16.0 2023-11-19 18:37:36,630 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.74 vs. limit=10.0 2023-11-19 18:37:41,746 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=760800.0, ans=0.2 2023-11-19 18:37:47,445 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.904e+01 8.348e+01 9.268e+01 1.091e+02 1.395e+02, threshold=1.854e+02, percent-clipped=0.0 2023-11-19 18:37:50,935 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.43 vs. limit=15.0 2023-11-19 18:37:53,946 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.45 vs. limit=22.5 2023-11-19 18:38:11,893 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 114150 2023-11-19 18:38:13,207 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=761000.0, ans=0.125 2023-11-19 18:38:26,617 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 5950, loss[loss=0.08872, simple_loss=0.1094, pruned_loss=0.02091, audio_tagging_loss=0.01313, over 14741.00 frames. ], tot_loss[loss=0.08543, simple_loss=0.1048, pruned_loss=0.02275, audio_tagging_loss=0.01028, over 3060416.21 frames. ], batch size: 55, lr: 6.94e-03, grad_scale: 16.0 2023-11-19 18:38:29,300 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=761066.6666666666, ans=0.0 2023-11-19 18:38:32,598 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=761066.6666666666, ans=0.0 2023-11-19 18:38:33,632 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=761066.6666666666, ans=0.0 2023-11-19 18:38:33,667 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=761066.6666666666, ans=0.125 2023-11-19 18:38:37,502 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=761066.6666666666, ans=0.125 2023-11-19 18:38:43,031 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=761133.3333333334, ans=0.125 2023-11-19 18:38:54,153 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=761200.0, ans=0.1 2023-11-19 18:38:54,460 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.76 vs. limit=15.0 2023-11-19 18:39:11,571 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.26 vs. limit=15.0 2023-11-19 18:39:13,672 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=761266.6666666666, ans=0.125 2023-11-19 18:39:15,911 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 114200 2023-11-19 18:39:16,570 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.06 vs. limit=6.0 2023-11-19 18:39:27,325 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.78 vs. limit=15.0 2023-11-19 18:39:28,273 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.96 vs. limit=12.0 2023-11-19 18:39:31,890 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 6000, loss[loss=0.1037, simple_loss=0.1258, pruned_loss=0.03134, audio_tagging_loss=0.009448, over 15176.00 frames. ], tot_loss[loss=0.08642, simple_loss=0.1061, pruned_loss=0.02314, audio_tagging_loss=0.01024, over 3055264.98 frames. ], batch size: 57, lr: 6.93e-03, grad_scale: 32.0 2023-11-19 18:39:31,893 INFO [train_asr.py:1285] (0/4) Computing validation loss 2023-11-19 18:40:12,642 INFO [train_asr.py:1294] (0/4) Epoch 10, validation: loss=0.06357, simple_loss=0.05534, pruned_loss=0.006382, audio_tagging_loss=0.02952, over 4681554.00 frames. 2023-11-19 18:40:12,643 INFO [train_asr.py:1295] (0/4) Maximum memory allocated so far is 25925MB 2023-11-19 18:40:37,149 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=761533.3333333334, ans=0.125 2023-11-19 18:40:38,556 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.626e+01 8.242e+01 9.055e+01 9.883e+01 1.211e+02, threshold=1.811e+02, percent-clipped=0.0 2023-11-19 18:40:58,303 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 18:41:02,039 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 114250 2023-11-19 18:41:07,138 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=761666.6666666666, ans=0.125 2023-11-19 18:41:07,370 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=761666.6666666666, ans=0.125 2023-11-19 18:41:09,861 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=761666.6666666666, ans=0.0 2023-11-19 18:41:17,063 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 6050, loss[loss=0.09105, simple_loss=0.1125, pruned_loss=0.02543, audio_tagging_loss=0.009376, over 15712.00 frames. ], tot_loss[loss=0.08678, simple_loss=0.1067, pruned_loss=0.02324, audio_tagging_loss=0.01017, over 3056932.23 frames. ], batch size: 58, lr: 6.93e-03, grad_scale: 32.0 2023-11-19 18:41:18,604 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=761733.3333333334, ans=0.0 2023-11-19 18:41:18,716 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=761733.3333333334, ans=0.125 2023-11-19 18:41:21,051 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=761733.3333333334, ans=0.1 2023-11-19 18:41:33,973 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=761800.0, ans=0.0 2023-11-19 18:41:59,479 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=761933.3333333334, ans=0.125 2023-11-19 18:42:06,889 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 114300 2023-11-19 18:42:14,978 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.85 vs. limit=15.0 2023-11-19 18:42:16,670 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=762000.0, ans=0.05 2023-11-19 18:42:23,029 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 6100, loss[loss=0.0823, simple_loss=0.0976, pruned_loss=0.01949, audio_tagging_loss=0.01401, over 14094.00 frames. ], tot_loss[loss=0.08608, simple_loss=0.1057, pruned_loss=0.02296, audio_tagging_loss=0.01025, over 3052404.16 frames. ], batch size: 55, lr: 6.93e-03, grad_scale: 32.0 2023-11-19 18:42:29,219 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.12 vs. limit=22.5 2023-11-19 18:42:47,795 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=762200.0, ans=0.0 2023-11-19 18:42:48,699 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.974e+01 8.508e+01 9.449e+01 1.032e+02 1.447e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-19 18:42:52,174 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.96 vs. limit=22.5 2023-11-19 18:43:13,165 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 114350 2023-11-19 18:43:28,441 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 6150, loss[loss=0.06497, simple_loss=0.08014, pruned_loss=0.01405, audio_tagging_loss=0.01085, over 15044.00 frames. ], tot_loss[loss=0.08579, simple_loss=0.105, pruned_loss=0.02296, audio_tagging_loss=0.0103, over 3045800.84 frames. ], batch size: 57, lr: 6.93e-03, grad_scale: 32.0 2023-11-19 18:43:34,983 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=762400.0, ans=0.0 2023-11-19 18:43:49,330 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.06 vs. limit=22.5 2023-11-19 18:44:18,109 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 114400 2023-11-19 18:44:33,324 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 6200, loss[loss=0.09387, simple_loss=0.1134, pruned_loss=0.02493, audio_tagging_loss=0.01223, over 15310.00 frames. ], tot_loss[loss=0.0854, simple_loss=0.1044, pruned_loss=0.02276, audio_tagging_loss=0.01043, over 3052177.39 frames. ], batch size: 56, lr: 6.93e-03, grad_scale: 32.0 2023-11-19 18:45:00,755 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.911e+01 8.346e+01 9.010e+01 9.734e+01 1.303e+02, threshold=1.802e+02, percent-clipped=0.0 2023-11-19 18:45:23,149 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 114450 2023-11-19 18:45:39,180 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 6250, loss[loss=0.07729, simple_loss=0.09538, pruned_loss=0.02084, audio_tagging_loss=0.008765, over 14706.00 frames. ], tot_loss[loss=0.08564, simple_loss=0.1045, pruned_loss=0.0229, audio_tagging_loss=0.0105, over 3051209.16 frames. ], batch size: 54, lr: 6.93e-03, grad_scale: 32.0 2023-11-19 18:45:52,566 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.05 vs. limit=6.0 2023-11-19 18:46:06,809 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=763200.0, ans=10.0 2023-11-19 18:46:26,002 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=763266.6666666666, ans=0.125 2023-11-19 18:46:28,954 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 114500 2023-11-19 18:46:33,734 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=763333.3333333334, ans=0.0 2023-11-19 18:46:45,252 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 6300, loss[loss=0.06422, simple_loss=0.06877, pruned_loss=0.01546, audio_tagging_loss=0.01438, over 14498.00 frames. ], tot_loss[loss=0.08573, simple_loss=0.1046, pruned_loss=0.02282, audio_tagging_loss=0.01058, over 3047661.78 frames. ], batch size: 54, lr: 6.93e-03, grad_scale: 32.0 2023-11-19 18:46:53,126 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=763400.0, ans=0.0 2023-11-19 18:46:59,774 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.51 vs. limit=22.5 2023-11-19 18:47:07,818 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=763466.6666666666, ans=0.125 2023-11-19 18:47:09,918 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.137e+01 8.385e+01 9.179e+01 1.044e+02 1.360e+02, threshold=1.836e+02, percent-clipped=0.0 2023-11-19 18:47:27,842 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=763600.0, ans=0.125 2023-11-19 18:47:31,549 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=763600.0, ans=0.125 2023-11-19 18:47:34,926 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 114550 2023-11-19 18:47:35,532 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.39 vs. limit=22.5 2023-11-19 18:47:38,897 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=763666.6666666666, ans=0.2 2023-11-19 18:47:49,855 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 6350, loss[loss=0.1014, simple_loss=0.1198, pruned_loss=0.0264, audio_tagging_loss=0.01511, over 15032.00 frames. ], tot_loss[loss=0.08526, simple_loss=0.104, pruned_loss=0.0226, audio_tagging_loss=0.01065, over 3053495.31 frames. ], batch size: 56, lr: 6.92e-03, grad_scale: 32.0 2023-11-19 18:47:55,140 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=763733.3333333334, ans=0.0 2023-11-19 18:48:03,692 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=763800.0, ans=0.2 2023-11-19 18:48:10,721 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=763800.0, ans=0.125 2023-11-19 18:48:14,330 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.62 vs. limit=15.0 2023-11-19 18:48:38,865 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 114600 2023-11-19 18:48:43,320 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=764000.0, ans=0.1 2023-11-19 18:48:54,853 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 6400, loss[loss=0.08704, simple_loss=0.1059, pruned_loss=0.02203, audio_tagging_loss=0.01206, over 16570.00 frames. ], tot_loss[loss=0.08597, simple_loss=0.1048, pruned_loss=0.02286, audio_tagging_loss=0.01071, over 3047653.52 frames. ], batch size: 62, lr: 6.92e-03, grad_scale: 32.0 2023-11-19 18:49:15,883 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=764133.3333333334, ans=0.1 2023-11-19 18:49:21,742 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.802e+01 8.099e+01 8.680e+01 9.158e+01 1.578e+02, threshold=1.736e+02, percent-clipped=0.0 2023-11-19 18:49:34,750 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.45 vs. limit=15.0 2023-11-19 18:49:44,823 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 114650 2023-11-19 18:49:49,577 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.23 vs. limit=12.0 2023-11-19 18:50:01,273 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 6450, loss[loss=0.07202, simple_loss=0.09287, pruned_loss=0.01889, audio_tagging_loss=0.006698, over 15309.00 frames. ], tot_loss[loss=0.08597, simple_loss=0.1048, pruned_loss=0.02278, audio_tagging_loss=0.01081, over 3048432.26 frames. ], batch size: 56, lr: 6.92e-03, grad_scale: 32.0 2023-11-19 18:50:04,208 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=764400.0, ans=0.125 2023-11-19 18:50:06,495 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=764400.0, ans=0.1 2023-11-19 18:50:10,224 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=764400.0, ans=0.09899494936611666 2023-11-19 18:50:11,481 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=764400.0, ans=0.0 2023-11-19 18:50:16,493 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=5.102e-03 2023-11-19 18:50:22,873 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=764466.6666666666, ans=0.125 2023-11-19 18:50:39,099 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.05 vs. limit=15.0 2023-11-19 18:50:50,165 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 114700 2023-11-19 18:51:05,448 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.28 vs. limit=15.0 2023-11-19 18:51:05,828 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 6500, loss[loss=0.08876, simple_loss=0.106, pruned_loss=0.02411, audio_tagging_loss=0.01166, over 15175.00 frames. ], tot_loss[loss=0.08535, simple_loss=0.104, pruned_loss=0.02259, audio_tagging_loss=0.01077, over 3049335.23 frames. ], batch size: 56, lr: 6.92e-03, grad_scale: 32.0 2023-11-19 18:51:06,461 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.01 vs. limit=22.5 2023-11-19 18:51:08,539 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=764733.3333333334, ans=0.125 2023-11-19 18:51:31,218 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=764866.6666666666, ans=0.1 2023-11-19 18:51:32,083 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.963e+01 8.435e+01 9.152e+01 1.009e+02 1.379e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-19 18:51:40,778 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=764866.6666666666, ans=0.125 2023-11-19 18:51:56,217 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 114750 2023-11-19 18:51:57,634 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=765000.0, ans=0.0 2023-11-19 18:51:57,791 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=765000.0, ans=0.125 2023-11-19 18:52:11,138 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 6550, loss[loss=0.06838, simple_loss=0.09324, pruned_loss=0.01294, audio_tagging_loss=0.008818, over 14811.00 frames. ], tot_loss[loss=0.08555, simple_loss=0.1045, pruned_loss=0.02274, audio_tagging_loss=0.01055, over 3048603.37 frames. ], batch size: 56, lr: 6.92e-03, grad_scale: 32.0 2023-11-19 18:52:28,351 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=765133.3333333334, ans=0.0 2023-11-19 18:52:39,283 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.32 vs. limit=15.0 2023-11-19 18:52:46,293 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=765200.0, ans=0.5 2023-11-19 18:52:48,528 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=765200.0, ans=0.2 2023-11-19 18:52:48,595 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=765200.0, ans=0.05 2023-11-19 18:53:00,318 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.76 vs. limit=6.0 2023-11-19 18:53:00,806 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 114800 2023-11-19 18:53:08,487 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=765333.3333333334, ans=0.125 2023-11-19 18:53:17,396 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 6600, loss[loss=0.07179, simple_loss=0.09338, pruned_loss=0.01535, audio_tagging_loss=0.00975, over 15903.00 frames. ], tot_loss[loss=0.08581, simple_loss=0.1054, pruned_loss=0.02277, audio_tagging_loss=0.01035, over 3053149.92 frames. ], batch size: 62, lr: 6.92e-03, grad_scale: 32.0 2023-11-19 18:53:40,751 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=765466.6666666666, ans=0.2 2023-11-19 18:53:42,957 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.143e+01 8.495e+01 9.015e+01 9.763e+01 1.318e+02, threshold=1.803e+02, percent-clipped=0.0 2023-11-19 18:54:07,119 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 114850 2023-11-19 18:54:17,315 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.19 vs. limit=15.0 2023-11-19 18:54:22,840 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 6650, loss[loss=0.06901, simple_loss=0.08399, pruned_loss=0.01586, audio_tagging_loss=0.01116, over 16547.00 frames. ], tot_loss[loss=0.08552, simple_loss=0.1051, pruned_loss=0.02262, audio_tagging_loss=0.01035, over 3054554.83 frames. ], batch size: 63, lr: 6.92e-03, grad_scale: 16.0 2023-11-19 18:54:45,083 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=765800.0, ans=0.07 2023-11-19 18:54:54,949 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=765866.6666666666, ans=0.125 2023-11-19 18:55:01,081 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=765933.3333333334, ans=0.1 2023-11-19 18:55:12,764 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 114900 2023-11-19 18:55:18,001 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=766000.0, ans=0.125 2023-11-19 18:55:27,726 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 6700, loss[loss=0.06483, simple_loss=0.08167, pruned_loss=0.01446, audio_tagging_loss=0.009538, over 14889.00 frames. ], tot_loss[loss=0.08559, simple_loss=0.1053, pruned_loss=0.02267, audio_tagging_loss=0.01029, over 3041202.61 frames. ], batch size: 56, lr: 6.91e-03, grad_scale: 16.0 2023-11-19 18:55:34,866 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=766066.6666666666, ans=0.07 2023-11-19 18:55:40,481 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=766133.3333333334, ans=0.0 2023-11-19 18:55:55,859 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.722e+01 8.369e+01 9.025e+01 9.789e+01 1.375e+02, threshold=1.805e+02, percent-clipped=0.0 2023-11-19 18:56:17,634 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 114950 2023-11-19 18:56:30,489 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=766333.3333333334, ans=0.125 2023-11-19 18:56:34,538 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 6750, loss[loss=0.08742, simple_loss=0.103, pruned_loss=0.02407, audio_tagging_loss=0.01184, over 14667.00 frames. ], tot_loss[loss=0.08512, simple_loss=0.1044, pruned_loss=0.02261, audio_tagging_loss=0.01029, over 3039142.25 frames. ], batch size: 55, lr: 6.91e-03, grad_scale: 16.0 2023-11-19 18:56:53,553 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.14 vs. limit=15.0 2023-11-19 18:57:13,693 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=766600.0, ans=0.125 2023-11-19 18:57:17,845 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=766600.0, ans=0.0 2023-11-19 18:57:24,363 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 115000 2023-11-19 18:57:28,738 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=766666.6666666666, ans=0.04949747468305833 2023-11-19 18:57:39,588 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 6800, loss[loss=0.09836, simple_loss=0.1213, pruned_loss=0.02863, audio_tagging_loss=0.009088, over 16059.00 frames. ], tot_loss[loss=0.08549, simple_loss=0.1047, pruned_loss=0.0228, audio_tagging_loss=0.01032, over 3044228.20 frames. ], batch size: 61, lr: 6.91e-03, grad_scale: 32.0 2023-11-19 18:58:03,358 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=766800.0, ans=0.2 2023-11-19 18:58:07,410 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.734e+01 8.232e+01 9.131e+01 9.667e+01 1.376e+02, threshold=1.826e+02, percent-clipped=0.0 2023-11-19 18:58:10,334 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=766866.6666666666, ans=0.125 2023-11-19 18:58:29,274 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 115050 2023-11-19 18:58:44,796 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 6850, loss[loss=0.09549, simple_loss=0.1207, pruned_loss=0.02576, audio_tagging_loss=0.009402, over 15560.00 frames. ], tot_loss[loss=0.08466, simple_loss=0.1037, pruned_loss=0.02245, audio_tagging_loss=0.01034, over 3040232.50 frames. ], batch size: 59, lr: 6.91e-03, grad_scale: 32.0 2023-11-19 18:59:34,649 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 115100 2023-11-19 18:59:38,738 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.00 vs. limit=15.0 2023-11-19 18:59:44,032 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=767333.3333333334, ans=0.125 2023-11-19 18:59:50,465 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 6900, loss[loss=0.1066, simple_loss=0.1195, pruned_loss=0.03427, audio_tagging_loss=0.01259, over 15391.00 frames. ], tot_loss[loss=0.08488, simple_loss=0.1045, pruned_loss=0.02237, audio_tagging_loss=0.01028, over 3044420.11 frames. ], batch size: 56, lr: 6.91e-03, grad_scale: 32.0 2023-11-19 18:59:59,127 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.64 vs. limit=15.0 2023-11-19 18:59:59,294 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.83 vs. limit=15.0 2023-11-19 19:00:16,996 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.004e+01 8.111e+01 8.753e+01 9.460e+01 1.253e+02, threshold=1.751e+02, percent-clipped=0.0 2023-11-19 19:00:23,320 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.73 vs. limit=15.0 2023-11-19 19:00:23,380 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.23 vs. limit=22.5 2023-11-19 19:00:27,946 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=767600.0, ans=0.0 2023-11-19 19:00:31,823 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.83 vs. limit=10.0 2023-11-19 19:00:34,570 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=767600.0, ans=0.125 2023-11-19 19:00:40,062 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 19:00:40,162 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 115150 2023-11-19 19:00:41,654 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=767666.6666666666, ans=0.125 2023-11-19 19:00:45,034 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.13 vs. limit=12.0 2023-11-19 19:00:49,726 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=767666.6666666666, ans=0.2 2023-11-19 19:00:55,480 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 6950, loss[loss=0.07438, simple_loss=0.08962, pruned_loss=0.0183, audio_tagging_loss=0.01127, over 15369.00 frames. ], tot_loss[loss=0.08499, simple_loss=0.1048, pruned_loss=0.02234, audio_tagging_loss=0.01025, over 3042012.94 frames. ], batch size: 57, lr: 6.91e-03, grad_scale: 32.0 2023-11-19 19:01:06,019 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=767733.3333333334, ans=0.2 2023-11-19 19:01:38,426 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=767933.3333333334, ans=0.07 2023-11-19 19:01:40,098 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.50 vs. limit=15.0 2023-11-19 19:01:43,382 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=767933.3333333334, ans=0.125 2023-11-19 19:01:45,752 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 115200 2023-11-19 19:02:00,904 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 7000, loss[loss=0.08558, simple_loss=0.1001, pruned_loss=0.02268, audio_tagging_loss=0.01284, over 14445.00 frames. ], tot_loss[loss=0.08477, simple_loss=0.1041, pruned_loss=0.02234, audio_tagging_loss=0.01038, over 3036931.78 frames. ], batch size: 56, lr: 6.90e-03, grad_scale: 32.0 2023-11-19 19:02:07,000 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.93 vs. limit=15.0 2023-11-19 19:02:29,329 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.682e+01 8.339e+01 9.135e+01 1.019e+02 1.398e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-19 19:02:36,014 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=768200.0, ans=0.125 2023-11-19 19:02:37,132 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=768200.0, ans=0.1 2023-11-19 19:02:45,845 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=768266.6666666666, ans=0.09899494936611666 2023-11-19 19:02:50,610 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 115250 2023-11-19 19:02:55,303 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=768333.3333333334, ans=0.05 2023-11-19 19:02:59,199 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.75 vs. limit=15.0 2023-11-19 19:03:07,200 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 7050, loss[loss=0.06193, simple_loss=0.07942, pruned_loss=0.0116, audio_tagging_loss=0.01061, over 14621.00 frames. ], tot_loss[loss=0.08457, simple_loss=0.1039, pruned_loss=0.0222, audio_tagging_loss=0.01042, over 3033814.50 frames. ], batch size: 55, lr: 6.90e-03, grad_scale: 32.0 2023-11-19 19:03:08,037 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.80 vs. limit=6.0 2023-11-19 19:03:09,873 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=768400.0, ans=0.125 2023-11-19 19:03:41,262 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=768533.3333333334, ans=0.1 2023-11-19 19:03:41,968 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.94 vs. limit=15.0 2023-11-19 19:03:48,187 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=768600.0, ans=0.125 2023-11-19 19:03:56,501 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 115300 2023-11-19 19:04:11,871 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 7100, loss[loss=0.09108, simple_loss=0.1129, pruned_loss=0.02266, audio_tagging_loss=0.01197, over 16498.00 frames. ], tot_loss[loss=0.08511, simple_loss=0.1045, pruned_loss=0.02241, audio_tagging_loss=0.01046, over 3049454.55 frames. ], batch size: 63, lr: 6.90e-03, grad_scale: 32.0 2023-11-19 19:04:33,607 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=768800.0, ans=0.0 2023-11-19 19:04:38,352 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.824e+01 8.390e+01 9.120e+01 9.831e+01 1.700e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-19 19:04:42,264 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.49 vs. limit=22.5 2023-11-19 19:05:01,398 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 115350 2023-11-19 19:05:16,473 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 7150, loss[loss=0.0676, simple_loss=0.08372, pruned_loss=0.01474, audio_tagging_loss=0.011, over 15106.00 frames. ], tot_loss[loss=0.08547, simple_loss=0.1049, pruned_loss=0.02263, audio_tagging_loss=0.0104, over 3050460.09 frames. ], batch size: 56, lr: 6.90e-03, grad_scale: 32.0 2023-11-19 19:05:45,460 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=769200.0, ans=0.2 2023-11-19 19:05:46,925 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.60 vs. limit=6.0 2023-11-19 19:06:06,713 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 115400 2023-11-19 19:06:06,861 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=769266.6666666666, ans=0.0 2023-11-19 19:06:08,535 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=769333.3333333334, ans=0.2 2023-11-19 19:06:09,144 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.75 vs. limit=15.0 2023-11-19 19:06:23,056 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 7200, loss[loss=0.1009, simple_loss=0.1234, pruned_loss=0.03121, audio_tagging_loss=0.008003, over 15208.00 frames. ], tot_loss[loss=0.08576, simple_loss=0.105, pruned_loss=0.02283, audio_tagging_loss=0.01041, over 3046054.14 frames. ], batch size: 56, lr: 6.90e-03, grad_scale: 32.0 2023-11-19 19:06:44,297 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=769466.6666666666, ans=0.125 2023-11-19 19:06:50,185 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.178e+01 8.420e+01 9.034e+01 9.720e+01 1.175e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-19 19:07:05,584 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=769600.0, ans=0.125 2023-11-19 19:07:13,133 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 115450 2023-11-19 19:07:29,100 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 7250, loss[loss=0.08885, simple_loss=0.114, pruned_loss=0.02142, audio_tagging_loss=0.01044, over 14803.00 frames. ], tot_loss[loss=0.08679, simple_loss=0.1062, pruned_loss=0.02319, audio_tagging_loss=0.01049, over 3046832.83 frames. ], batch size: 56, lr: 6.90e-03, grad_scale: 32.0 2023-11-19 19:07:42,294 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.67 vs. limit=10.0 2023-11-19 19:07:48,706 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.05 vs. limit=22.5 2023-11-19 19:08:02,571 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=769866.6666666666, ans=0.2 2023-11-19 19:08:18,645 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 115500 2023-11-19 19:08:23,078 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.47 vs. limit=15.0 2023-11-19 19:08:28,929 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=770000.0, ans=0.0 2023-11-19 19:08:33,595 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 7300, loss[loss=0.06293, simple_loss=0.07785, pruned_loss=0.01323, audio_tagging_loss=0.01078, over 15285.00 frames. ], tot_loss[loss=0.08658, simple_loss=0.1062, pruned_loss=0.02316, audio_tagging_loss=0.01035, over 3051027.66 frames. ], batch size: 58, lr: 6.90e-03, grad_scale: 32.0 2023-11-19 19:08:35,197 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=770066.6666666666, ans=0.0 2023-11-19 19:08:36,290 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=770066.6666666666, ans=0.125 2023-11-19 19:09:02,030 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.990e+01 8.445e+01 8.971e+01 9.866e+01 1.829e+02, threshold=1.794e+02, percent-clipped=1.0 2023-11-19 19:09:06,364 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.49 vs. limit=12.0 2023-11-19 19:09:22,138 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=770266.6666666666, ans=0.125 2023-11-19 19:09:23,298 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 115550 2023-11-19 19:09:31,118 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.09 vs. limit=15.0 2023-11-19 19:09:37,902 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 7350, loss[loss=0.09382, simple_loss=0.1188, pruned_loss=0.02721, audio_tagging_loss=0.007209, over 14439.00 frames. ], tot_loss[loss=0.08615, simple_loss=0.1059, pruned_loss=0.02293, audio_tagging_loss=0.01025, over 3046423.71 frames. ], batch size: 53, lr: 6.89e-03, grad_scale: 32.0 2023-11-19 19:09:45,649 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=770400.0, ans=0.125 2023-11-19 19:10:26,785 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 115600 2023-11-19 19:10:36,494 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=770666.6666666666, ans=0.1 2023-11-19 19:10:42,281 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=770666.6666666666, ans=0.1 2023-11-19 19:10:44,519 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 7400, loss[loss=0.09974, simple_loss=0.1157, pruned_loss=0.03211, audio_tagging_loss=0.009763, over 13562.00 frames. ], tot_loss[loss=0.0856, simple_loss=0.1054, pruned_loss=0.02272, audio_tagging_loss=0.01016, over 3044796.65 frames. ], batch size: 52, lr: 6.89e-03, grad_scale: 16.0 2023-11-19 19:10:44,814 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=770733.3333333334, ans=0.0 2023-11-19 19:10:44,821 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=770733.3333333334, ans=0.125 2023-11-19 19:10:54,885 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=770733.3333333334, ans=0.95 2023-11-19 19:11:03,363 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=770800.0, ans=0.125 2023-11-19 19:11:09,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=770866.6666666666, ans=0.125 2023-11-19 19:11:11,825 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.244e+01 8.500e+01 9.123e+01 1.022e+02 1.403e+02, threshold=1.825e+02, percent-clipped=0.0 2023-11-19 19:11:15,918 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=770866.6666666666, ans=0.125 2023-11-19 19:11:20,261 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=770866.6666666666, ans=0.1 2023-11-19 19:11:34,143 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 115650 2023-11-19 19:11:36,810 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=771000.0, ans=10.0 2023-11-19 19:11:47,213 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.56 vs. limit=15.0 2023-11-19 19:11:49,070 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 7450, loss[loss=0.06732, simple_loss=0.09114, pruned_loss=0.01541, audio_tagging_loss=0.006342, over 15603.00 frames. ], tot_loss[loss=0.08574, simple_loss=0.1059, pruned_loss=0.02273, audio_tagging_loss=0.01006, over 3048722.50 frames. ], batch size: 58, lr: 6.89e-03, grad_scale: 16.0 2023-11-19 19:12:00,360 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=771133.3333333334, ans=0.0 2023-11-19 19:12:18,544 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.44 vs. limit=15.0 2023-11-19 19:12:37,762 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 115700 2023-11-19 19:12:41,713 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=771333.3333333334, ans=0.1 2023-11-19 19:12:41,716 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=771333.3333333334, ans=0.125 2023-11-19 19:12:52,569 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 7500, loss[loss=0.08496, simple_loss=0.09655, pruned_loss=0.02555, audio_tagging_loss=0.01113, over 16033.00 frames. ], tot_loss[loss=0.08571, simple_loss=0.1057, pruned_loss=0.02291, audio_tagging_loss=0.009972, over 3052553.13 frames. ], batch size: 60, lr: 6.89e-03, grad_scale: 16.0 2023-11-19 19:12:57,254 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=771400.0, ans=0.1 2023-11-19 19:13:22,180 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.878e+01 8.310e+01 8.971e+01 9.844e+01 3.516e+02, threshold=1.794e+02, percent-clipped=1.0 2023-11-19 19:13:41,884 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 115750 2023-11-19 19:13:59,057 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 7550, loss[loss=0.1049, simple_loss=0.1258, pruned_loss=0.03114, audio_tagging_loss=0.01091, over 15197.00 frames. ], tot_loss[loss=0.0851, simple_loss=0.1047, pruned_loss=0.02269, audio_tagging_loss=0.01006, over 3050985.49 frames. ], batch size: 57, lr: 6.89e-03, grad_scale: 16.0 2023-11-19 19:14:04,476 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=771733.3333333334, ans=0.125 2023-11-19 19:14:33,261 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=771866.6666666666, ans=0.125 2023-11-19 19:14:43,479 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=771933.3333333334, ans=0.0 2023-11-19 19:14:45,287 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.92 vs. limit=10.0 2023-11-19 19:14:48,806 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 115800 2023-11-19 19:14:55,561 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=772000.0, ans=0.125 2023-11-19 19:14:58,092 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=772000.0, ans=0.1 2023-11-19 19:15:04,057 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 7600, loss[loss=0.07486, simple_loss=0.09291, pruned_loss=0.01896, audio_tagging_loss=0.009442, over 14500.00 frames. ], tot_loss[loss=0.08453, simple_loss=0.1037, pruned_loss=0.02254, audio_tagging_loss=0.01013, over 3046738.42 frames. ], batch size: 54, lr: 6.89e-03, grad_scale: 32.0 2023-11-19 19:15:26,607 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.98 vs. limit=15.0 2023-11-19 19:15:32,979 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.786e+01 8.395e+01 8.967e+01 1.032e+02 1.336e+02, threshold=1.793e+02, percent-clipped=0.0 2023-11-19 19:15:34,598 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=772200.0, ans=0.125 2023-11-19 19:15:41,632 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.67 vs. limit=15.0 2023-11-19 19:15:53,514 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 115850 2023-11-19 19:16:05,131 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=772333.3333333334, ans=0.2 2023-11-19 19:16:08,681 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 7650, loss[loss=0.07736, simple_loss=0.1051, pruned_loss=0.01509, audio_tagging_loss=0.00971, over 16187.00 frames. ], tot_loss[loss=0.08463, simple_loss=0.1037, pruned_loss=0.0225, audio_tagging_loss=0.01027, over 3046906.06 frames. ], batch size: 58, lr: 6.89e-03, grad_scale: 32.0 2023-11-19 19:16:24,442 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=772466.6666666666, ans=0.1 2023-11-19 19:16:58,681 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 115900 2023-11-19 19:16:58,974 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=772600.0, ans=0.0 2023-11-19 19:17:15,211 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 7700, loss[loss=0.1017, simple_loss=0.1223, pruned_loss=0.03106, audio_tagging_loss=0.009466, over 14852.00 frames. ], tot_loss[loss=0.08468, simple_loss=0.1038, pruned_loss=0.02247, audio_tagging_loss=0.01031, over 3041845.52 frames. ], batch size: 54, lr: 6.88e-03, grad_scale: 32.0 2023-11-19 19:17:43,121 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.843e+01 8.063e+01 8.419e+01 9.189e+01 1.330e+02, threshold=1.684e+02, percent-clipped=0.0 2023-11-19 19:17:53,726 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=772933.3333333334, ans=0.125 2023-11-19 19:18:04,578 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 115950 2023-11-19 19:18:11,729 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.62 vs. limit=15.0 2023-11-19 19:18:14,654 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.15 vs. limit=22.5 2023-11-19 19:18:20,139 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 7750, loss[loss=0.06771, simple_loss=0.07527, pruned_loss=0.01843, audio_tagging_loss=0.01164, over 14328.00 frames. ], tot_loss[loss=0.08434, simple_loss=0.1032, pruned_loss=0.02226, audio_tagging_loss=0.01046, over 3041801.75 frames. ], batch size: 54, lr: 6.88e-03, grad_scale: 32.0 2023-11-19 19:18:36,570 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=773133.3333333334, ans=0.125 2023-11-19 19:18:37,919 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=773133.3333333334, ans=0.0 2023-11-19 19:18:39,228 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=773133.3333333334, ans=0.0 2023-11-19 19:19:01,917 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=773266.6666666666, ans=0.0 2023-11-19 19:19:09,779 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 116000 2023-11-19 19:19:11,395 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-116000.pt 2023-11-19 19:19:28,170 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 7800, loss[loss=0.06043, simple_loss=0.08079, pruned_loss=0.01009, audio_tagging_loss=0.009947, over 15638.00 frames. ], tot_loss[loss=0.08485, simple_loss=0.1042, pruned_loss=0.02237, audio_tagging_loss=0.01038, over 3045536.77 frames. ], batch size: 58, lr: 6.88e-03, grad_scale: 32.0 2023-11-19 19:19:58,887 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.943e+01 8.729e+01 9.223e+01 9.822e+01 1.484e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-19 19:20:17,771 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 116050 2023-11-19 19:20:17,838 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=773600.0, ans=0.1 2023-11-19 19:20:20,375 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=773666.6666666666, ans=0.125 2023-11-19 19:20:34,231 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 7850, loss[loss=0.08669, simple_loss=0.1047, pruned_loss=0.02438, audio_tagging_loss=0.009974, over 14295.00 frames. ], tot_loss[loss=0.08512, simple_loss=0.1046, pruned_loss=0.02242, audio_tagging_loss=0.01039, over 3048146.27 frames. ], batch size: 56, lr: 6.88e-03, grad_scale: 16.0 2023-11-19 19:20:35,727 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=773733.3333333334, ans=0.125 2023-11-19 19:20:38,495 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.69 vs. limit=15.0 2023-11-19 19:20:39,739 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.15 vs. limit=15.0 2023-11-19 19:20:49,105 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.95 vs. limit=15.0 2023-11-19 19:20:58,879 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.74 vs. limit=15.0 2023-11-19 19:21:06,561 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.19 vs. limit=10.0 2023-11-19 19:21:22,220 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=773933.3333333334, ans=0.125 2023-11-19 19:21:23,702 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 116100 2023-11-19 19:21:26,385 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=774000.0, ans=0.0 2023-11-19 19:21:33,786 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=774000.0, ans=0.0 2023-11-19 19:21:38,917 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 7900, loss[loss=0.09938, simple_loss=0.1269, pruned_loss=0.02602, audio_tagging_loss=0.009917, over 15013.00 frames. ], tot_loss[loss=0.0848, simple_loss=0.104, pruned_loss=0.02231, audio_tagging_loss=0.01047, over 3039659.74 frames. ], batch size: 57, lr: 6.88e-03, grad_scale: 16.0 2023-11-19 19:21:50,298 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=774133.3333333334, ans=0.0 2023-11-19 19:21:53,374 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.94 vs. limit=22.5 2023-11-19 19:21:57,556 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=774133.3333333334, ans=0.0 2023-11-19 19:22:08,443 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.972e+01 8.408e+01 9.159e+01 1.027e+02 1.292e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-19 19:22:10,014 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=774200.0, ans=0.2 2023-11-19 19:22:27,805 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 116150 2023-11-19 19:22:29,794 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=774333.3333333334, ans=0.125 2023-11-19 19:22:36,588 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.91 vs. limit=15.0 2023-11-19 19:22:43,194 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 7950, loss[loss=0.09955, simple_loss=0.1256, pruned_loss=0.02614, audio_tagging_loss=0.01061, over 14928.00 frames. ], tot_loss[loss=0.08544, simple_loss=0.1043, pruned_loss=0.02266, audio_tagging_loss=0.01063, over 3036342.53 frames. ], batch size: 57, lr: 6.88e-03, grad_scale: 16.0 2023-11-19 19:22:57,995 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 19:23:05,107 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=774466.6666666666, ans=0.0 2023-11-19 19:23:07,510 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=774466.6666666666, ans=0.2 2023-11-19 19:23:32,876 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 116200 2023-11-19 19:23:39,560 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=774666.6666666666, ans=0.0 2023-11-19 19:23:39,862 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=774666.6666666666, ans=0.125 2023-11-19 19:23:49,428 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 8000, loss[loss=0.08746, simple_loss=0.1015, pruned_loss=0.02428, audio_tagging_loss=0.01243, over 15620.00 frames. ], tot_loss[loss=0.08543, simple_loss=0.1038, pruned_loss=0.02274, audio_tagging_loss=0.01078, over 3038994.61 frames. ], batch size: 59, lr: 6.88e-03, grad_scale: 32.0 2023-11-19 19:24:05,123 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=774800.0, ans=0.125 2023-11-19 19:24:19,080 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.333e+01 8.445e+01 9.319e+01 1.021e+02 1.426e+02, threshold=1.864e+02, percent-clipped=0.0 2023-11-19 19:24:35,286 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.47 vs. limit=22.5 2023-11-19 19:24:38,984 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 116250 2023-11-19 19:24:54,249 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 8050, loss[loss=0.0799, simple_loss=0.0936, pruned_loss=0.01565, audio_tagging_loss=0.01745, over 15555.00 frames. ], tot_loss[loss=0.08538, simple_loss=0.1034, pruned_loss=0.02277, audio_tagging_loss=0.0109, over 3038161.74 frames. ], batch size: 57, lr: 6.87e-03, grad_scale: 32.0 2023-11-19 19:24:58,419 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=775066.6666666666, ans=0.0 2023-11-19 19:25:06,457 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=775133.3333333334, ans=0.0 2023-11-19 19:25:22,435 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=775200.0, ans=0.0 2023-11-19 19:25:44,057 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 116300 2023-11-19 19:25:44,249 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=775266.6666666666, ans=0.1 2023-11-19 19:25:49,193 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=775333.3333333334, ans=0.125 2023-11-19 19:25:51,736 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=775333.3333333334, ans=0.07 2023-11-19 19:25:59,363 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 8100, loss[loss=0.1056, simple_loss=0.1343, pruned_loss=0.0291, audio_tagging_loss=0.009309, over 15313.00 frames. ], tot_loss[loss=0.08567, simple_loss=0.1039, pruned_loss=0.023, audio_tagging_loss=0.01074, over 3034699.68 frames. ], batch size: 56, lr: 6.87e-03, grad_scale: 32.0 2023-11-19 19:26:14,597 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.65 vs. limit=15.0 2023-11-19 19:26:17,172 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=775466.6666666666, ans=0.125 2023-11-19 19:26:17,256 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=775466.6666666666, ans=0.2 2023-11-19 19:26:19,664 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=775466.6666666666, ans=0.2 2023-11-19 19:26:26,487 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=775533.3333333334, ans=0.125 2023-11-19 19:26:29,913 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.744e+01 8.355e+01 8.902e+01 9.485e+01 1.238e+02, threshold=1.780e+02, percent-clipped=0.0 2023-11-19 19:26:35,410 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=775533.3333333334, ans=0.0 2023-11-19 19:26:48,712 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=775600.0, ans=0.125 2023-11-19 19:26:49,870 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 116350 2023-11-19 19:26:51,468 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=775666.6666666666, ans=0.1 2023-11-19 19:27:00,110 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=775666.6666666666, ans=0.1 2023-11-19 19:27:05,397 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 8150, loss[loss=0.09696, simple_loss=0.1191, pruned_loss=0.02684, audio_tagging_loss=0.01056, over 14915.00 frames. ], tot_loss[loss=0.08606, simple_loss=0.105, pruned_loss=0.02312, audio_tagging_loss=0.01047, over 3039982.18 frames. ], batch size: 56, lr: 6.87e-03, grad_scale: 32.0 2023-11-19 19:27:23,635 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.91 vs. limit=15.0 2023-11-19 19:27:24,581 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=4.355e-01 2023-11-19 19:27:55,528 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 116400 2023-11-19 19:28:10,291 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 19:28:11,454 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 8200, loss[loss=0.08908, simple_loss=0.11, pruned_loss=0.02564, audio_tagging_loss=0.008421, over 15570.00 frames. ], tot_loss[loss=0.08626, simple_loss=0.1056, pruned_loss=0.02321, audio_tagging_loss=0.01028, over 3048677.55 frames. ], batch size: 57, lr: 6.87e-03, grad_scale: 32.0 2023-11-19 19:28:18,693 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.55 vs. limit=15.0 2023-11-19 19:28:41,266 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.260e+01 8.595e+01 9.342e+01 1.031e+02 1.321e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-19 19:28:44,113 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=776200.0, ans=0.125 2023-11-19 19:28:46,917 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.03 vs. limit=15.0 2023-11-19 19:28:48,436 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=776200.0, ans=0.0 2023-11-19 19:29:01,269 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 116450 2023-11-19 19:29:11,690 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 19:29:16,453 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 8250, loss[loss=0.09543, simple_loss=0.1158, pruned_loss=0.0252, audio_tagging_loss=0.01234, over 13789.00 frames. ], tot_loss[loss=0.08673, simple_loss=0.1061, pruned_loss=0.02348, audio_tagging_loss=0.01021, over 3042588.24 frames. ], batch size: 52, lr: 6.87e-03, grad_scale: 32.0 2023-11-19 19:29:19,774 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=776400.0, ans=0.125 2023-11-19 19:29:36,792 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=776466.6666666666, ans=0.125 2023-11-19 19:29:44,171 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=776533.3333333334, ans=0.0 2023-11-19 19:29:50,481 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.56 vs. limit=15.0 2023-11-19 19:29:50,575 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.73 vs. limit=6.0 2023-11-19 19:29:51,447 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=776533.3333333334, ans=0.0 2023-11-19 19:30:06,027 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 116500 2023-11-19 19:30:22,189 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 8300, loss[loss=0.06861, simple_loss=0.0851, pruned_loss=0.01586, audio_tagging_loss=0.0102, over 15000.00 frames. ], tot_loss[loss=0.08627, simple_loss=0.1055, pruned_loss=0.02323, audio_tagging_loss=0.01028, over 3041494.81 frames. ], batch size: 57, lr: 6.87e-03, grad_scale: 32.0 2023-11-19 19:30:30,599 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=776733.3333333334, ans=0.0 2023-11-19 19:30:51,981 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.779e+01 8.250e+01 8.993e+01 9.595e+01 1.317e+02, threshold=1.799e+02, percent-clipped=0.0 2023-11-19 19:30:55,907 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=776866.6666666666, ans=0.2 2023-11-19 19:31:07,793 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=776933.3333333334, ans=0.125 2023-11-19 19:31:11,736 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 116550 2023-11-19 19:31:27,877 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 8350, loss[loss=0.1006, simple_loss=0.111, pruned_loss=0.03119, audio_tagging_loss=0.01391, over 14839.00 frames. ], tot_loss[loss=0.08678, simple_loss=0.1065, pruned_loss=0.02335, audio_tagging_loss=0.01018, over 3040496.88 frames. ], batch size: 55, lr: 6.86e-03, grad_scale: 32.0 2023-11-19 19:31:28,098 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=777066.6666666666, ans=0.2 2023-11-19 19:31:33,199 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=777066.6666666666, ans=0.125 2023-11-19 19:31:53,526 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=777200.0, ans=0.125 2023-11-19 19:31:58,717 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=777200.0, ans=0.0 2023-11-19 19:32:17,468 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 116600 2023-11-19 19:32:32,805 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 8400, loss[loss=0.08056, simple_loss=0.09346, pruned_loss=0.02054, audio_tagging_loss=0.01329, over 14990.00 frames. ], tot_loss[loss=0.08686, simple_loss=0.1068, pruned_loss=0.02328, audio_tagging_loss=0.01019, over 3047007.61 frames. ], batch size: 56, lr: 6.86e-03, grad_scale: 32.0 2023-11-19 19:32:33,394 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.02 vs. limit=15.0 2023-11-19 19:32:51,655 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=777466.6666666666, ans=0.125 2023-11-19 19:32:59,219 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.96 vs. limit=6.0 2023-11-19 19:33:02,030 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.27 vs. limit=22.5 2023-11-19 19:33:03,761 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.528e+01 8.258e+01 9.022e+01 9.929e+01 1.314e+02, threshold=1.804e+02, percent-clipped=0.0 2023-11-19 19:33:04,063 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=777533.3333333334, ans=0.1 2023-11-19 19:33:06,474 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=777533.3333333334, ans=0.95 2023-11-19 19:33:11,498 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=777600.0, ans=0.0 2023-11-19 19:33:12,632 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=777600.0, ans=0.2 2023-11-19 19:33:22,330 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 116650 2023-11-19 19:33:23,736 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=777666.6666666666, ans=0.0 2023-11-19 19:33:24,904 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=777666.6666666666, ans=0.125 2023-11-19 19:33:26,090 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=777666.6666666666, ans=0.1 2023-11-19 19:33:32,373 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=777666.6666666666, ans=0.0 2023-11-19 19:33:37,649 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 8450, loss[loss=0.06968, simple_loss=0.08116, pruned_loss=0.018, audio_tagging_loss=0.0111, over 15167.00 frames. ], tot_loss[loss=0.08694, simple_loss=0.1068, pruned_loss=0.0233, audio_tagging_loss=0.01026, over 3050272.99 frames. ], batch size: 58, lr: 6.86e-03, grad_scale: 32.0 2023-11-19 19:33:45,037 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.72 vs. limit=15.0 2023-11-19 19:34:22,212 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=777933.3333333334, ans=0.0 2023-11-19 19:34:27,528 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 116700 2023-11-19 19:34:40,998 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.50 vs. limit=22.5 2023-11-19 19:34:44,140 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 8500, loss[loss=0.1176, simple_loss=0.1358, pruned_loss=0.039, audio_tagging_loss=0.01068, over 14604.00 frames. ], tot_loss[loss=0.08622, simple_loss=0.1058, pruned_loss=0.02303, audio_tagging_loss=0.01031, over 3045619.52 frames. ], batch size: 56, lr: 6.86e-03, grad_scale: 32.0 2023-11-19 19:35:06,754 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=778133.3333333334, ans=0.0 2023-11-19 19:35:13,937 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.558e+01 8.293e+01 8.958e+01 9.904e+01 1.302e+02, threshold=1.792e+02, percent-clipped=0.0 2023-11-19 19:35:15,535 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=778200.0, ans=0.125 2023-11-19 19:35:33,133 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 116750 2023-11-19 19:35:35,889 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=778333.3333333334, ans=0.0 2023-11-19 19:35:37,286 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=778333.3333333334, ans=0.0 2023-11-19 19:35:39,823 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=778333.3333333334, ans=0.125 2023-11-19 19:35:48,132 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 8550, loss[loss=0.09452, simple_loss=0.1191, pruned_loss=0.02649, audio_tagging_loss=0.008469, over 14755.00 frames. ], tot_loss[loss=0.08567, simple_loss=0.1052, pruned_loss=0.0227, audio_tagging_loss=0.01037, over 3049134.60 frames. ], batch size: 53, lr: 6.86e-03, grad_scale: 16.0 2023-11-19 19:35:50,775 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=778400.0, ans=0.125 2023-11-19 19:36:35,704 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=778600.0, ans=0.0 2023-11-19 19:36:37,893 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 116800 2023-11-19 19:36:42,148 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=778666.6666666666, ans=0.0 2023-11-19 19:36:46,300 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.97 vs. limit=6.0 2023-11-19 19:36:49,362 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=778666.6666666666, ans=0.125 2023-11-19 19:36:52,890 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 8600, loss[loss=0.08869, simple_loss=0.1036, pruned_loss=0.02292, audio_tagging_loss=0.01396, over 14760.00 frames. ], tot_loss[loss=0.08516, simple_loss=0.1043, pruned_loss=0.02243, audio_tagging_loss=0.01058, over 3050771.62 frames. ], batch size: 55, lr: 6.86e-03, grad_scale: 16.0 2023-11-19 19:36:55,775 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=778733.3333333334, ans=0.125 2023-11-19 19:37:03,051 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=778733.3333333334, ans=0.1 2023-11-19 19:37:07,928 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=778800.0, ans=0.1 2023-11-19 19:37:24,749 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.251e+01 8.439e+01 9.027e+01 1.024e+02 1.472e+02, threshold=1.805e+02, percent-clipped=0.0 2023-11-19 19:37:42,157 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 116850 2023-11-19 19:37:44,962 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=779000.0, ans=0.0 2023-11-19 19:37:59,585 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 8650, loss[loss=0.1031, simple_loss=0.1273, pruned_loss=0.03035, audio_tagging_loss=0.009143, over 15253.00 frames. ], tot_loss[loss=0.08563, simple_loss=0.1049, pruned_loss=0.02265, audio_tagging_loss=0.01055, over 3056060.97 frames. ], batch size: 57, lr: 6.86e-03, grad_scale: 16.0 2023-11-19 19:38:06,031 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=779066.6666666666, ans=0.125 2023-11-19 19:38:24,707 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=779200.0, ans=0.0 2023-11-19 19:38:46,355 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=779266.6666666666, ans=0.125 2023-11-19 19:38:47,106 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.31 vs. limit=22.5 2023-11-19 19:38:48,828 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 116900 2023-11-19 19:38:48,921 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=779266.6666666666, ans=0.0 2023-11-19 19:39:03,578 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 8700, loss[loss=0.1013, simple_loss=0.1245, pruned_loss=0.03165, audio_tagging_loss=0.00741, over 16229.00 frames. ], tot_loss[loss=0.08524, simple_loss=0.1045, pruned_loss=0.02244, audio_tagging_loss=0.01054, over 3060896.73 frames. ], batch size: 59, lr: 6.85e-03, grad_scale: 16.0 2023-11-19 19:39:35,921 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.849e+01 8.405e+01 9.122e+01 9.859e+01 1.308e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-19 19:39:38,595 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=779533.3333333334, ans=0.0 2023-11-19 19:39:53,701 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 116950 2023-11-19 19:40:05,308 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=779666.6666666666, ans=0.125 2023-11-19 19:40:08,883 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 8750, loss[loss=0.05732, simple_loss=0.06271, pruned_loss=0.01152, audio_tagging_loss=0.01444, over 13886.00 frames. ], tot_loss[loss=0.08592, simple_loss=0.1053, pruned_loss=0.02269, audio_tagging_loss=0.01055, over 3054894.52 frames. ], batch size: 55, lr: 6.85e-03, grad_scale: 16.0 2023-11-19 19:40:46,264 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=779866.6666666666, ans=0.0 2023-11-19 19:40:51,264 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=779933.3333333334, ans=0.125 2023-11-19 19:40:58,648 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 117000 2023-11-19 19:41:12,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=780000.0, ans=0.05 2023-11-19 19:41:14,168 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=780066.6666666666, ans=0.125 2023-11-19 19:41:15,647 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 8800, loss[loss=0.108, simple_loss=0.1319, pruned_loss=0.03007, audio_tagging_loss=0.01193, over 15487.00 frames. ], tot_loss[loss=0.08689, simple_loss=0.1063, pruned_loss=0.02313, audio_tagging_loss=0.01058, over 3057063.75 frames. ], batch size: 56, lr: 6.85e-03, grad_scale: 32.0 2023-11-19 19:41:39,451 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.31 vs. limit=12.0 2023-11-19 19:41:46,212 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.896e+01 8.517e+01 9.316e+01 1.005e+02 1.428e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-19 19:42:02,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=780266.6666666666, ans=0.2 2023-11-19 19:42:05,437 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 117050 2023-11-19 19:42:15,323 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.whiten.whitening_limit, batch_count=780333.3333333334, ans=12.0 2023-11-19 19:42:17,511 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=780333.3333333334, ans=0.0 2023-11-19 19:42:19,897 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 19:42:20,910 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 8850, loss[loss=0.07919, simple_loss=0.1037, pruned_loss=0.01693, audio_tagging_loss=0.0104, over 15031.00 frames. ], tot_loss[loss=0.08627, simple_loss=0.1057, pruned_loss=0.0228, audio_tagging_loss=0.01062, over 3048222.35 frames. ], batch size: 55, lr: 6.85e-03, grad_scale: 32.0 2023-11-19 19:42:31,031 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 19:42:44,355 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=780466.6666666666, ans=0.0 2023-11-19 19:43:10,889 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 117100 2023-11-19 19:43:13,818 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.49 vs. limit=22.5 2023-11-19 19:43:25,686 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 8900, loss[loss=0.09376, simple_loss=0.1152, pruned_loss=0.02959, audio_tagging_loss=0.006555, over 14668.00 frames. ], tot_loss[loss=0.08597, simple_loss=0.1054, pruned_loss=0.02274, audio_tagging_loss=0.01052, over 3046266.18 frames. ], batch size: 54, lr: 6.85e-03, grad_scale: 32.0 2023-11-19 19:43:38,269 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.94 vs. limit=15.0 2023-11-19 19:43:42,020 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=780800.0, ans=0.0 2023-11-19 19:43:51,928 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=780866.6666666666, ans=0.2 2023-11-19 19:43:57,706 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.896e+01 8.524e+01 9.164e+01 1.008e+02 2.519e+02, threshold=1.833e+02, percent-clipped=1.0 2023-11-19 19:44:10,261 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=780933.3333333334, ans=0.1 2023-11-19 19:44:15,155 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 117150 2023-11-19 19:44:27,066 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=781000.0, ans=0.125 2023-11-19 19:44:30,948 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 8950, loss[loss=0.08979, simple_loss=0.1103, pruned_loss=0.0258, audio_tagging_loss=0.008827, over 14470.00 frames. ], tot_loss[loss=0.08516, simple_loss=0.1046, pruned_loss=0.02246, audio_tagging_loss=0.01041, over 3053315.22 frames. ], batch size: 53, lr: 6.85e-03, grad_scale: 32.0 2023-11-19 19:44:36,777 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=781066.6666666666, ans=0.125 2023-11-19 19:44:39,964 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=781066.6666666666, ans=0.2 2023-11-19 19:45:04,731 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=781200.0, ans=0.05 2023-11-19 19:45:07,307 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=781200.0, ans=0.1 2023-11-19 19:45:09,670 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=781266.6666666666, ans=0.0 2023-11-19 19:45:20,422 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 117200 2023-11-19 19:45:23,965 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=781333.3333333334, ans=22.5 2023-11-19 19:45:30,688 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=781333.3333333334, ans=0.0 2023-11-19 19:45:36,439 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 9000, loss[loss=0.09085, simple_loss=0.1084, pruned_loss=0.0262, audio_tagging_loss=0.01045, over 15417.00 frames. ], tot_loss[loss=0.08501, simple_loss=0.1044, pruned_loss=0.0225, audio_tagging_loss=0.01031, over 3057704.79 frames. ], batch size: 58, lr: 6.85e-03, grad_scale: 16.0 2023-11-19 19:45:36,443 INFO [train_asr.py:1285] (0/4) Computing validation loss 2023-11-19 19:46:18,844 INFO [train_asr.py:1294] (0/4) Epoch 10, validation: loss=0.06518, simple_loss=0.05524, pruned_loss=0.006372, audio_tagging_loss=0.03119, over 4681554.00 frames. 2023-11-19 19:46:18,844 INFO [train_asr.py:1295] (0/4) Maximum memory allocated so far is 25925MB 2023-11-19 19:46:50,179 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=781533.3333333334, ans=0.2 2023-11-19 19:46:51,590 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=781533.3333333334, ans=0.95 2023-11-19 19:46:52,444 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.260e+01 8.536e+01 8.923e+01 9.790e+01 1.498e+02, threshold=1.785e+02, percent-clipped=0.0 2023-11-19 19:47:05,294 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=781600.0, ans=0.0 2023-11-19 19:47:06,590 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=781600.0, ans=0.125 2023-11-19 19:47:08,933 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 117250 2023-11-19 19:47:12,338 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=781666.6666666666, ans=0.09899494936611666 2023-11-19 19:47:25,441 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 9050, loss[loss=0.05636, simple_loss=0.06939, pruned_loss=0.01271, audio_tagging_loss=0.00896, over 14791.00 frames. ], tot_loss[loss=0.08537, simple_loss=0.105, pruned_loss=0.02262, audio_tagging_loss=0.01023, over 3064979.35 frames. ], batch size: 58, lr: 6.84e-03, grad_scale: 16.0 2023-11-19 19:47:49,910 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=781866.6666666666, ans=0.125 2023-11-19 19:48:01,512 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=781866.6666666666, ans=0.1 2023-11-19 19:48:15,120 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 117300 2023-11-19 19:48:26,429 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.66 vs. limit=15.0 2023-11-19 19:48:30,602 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 9100, loss[loss=0.106, simple_loss=0.1317, pruned_loss=0.0324, audio_tagging_loss=0.007747, over 15558.00 frames. ], tot_loss[loss=0.08491, simple_loss=0.1044, pruned_loss=0.02248, audio_tagging_loss=0.01021, over 3061744.49 frames. ], batch size: 57, lr: 6.84e-03, grad_scale: 16.0 2023-11-19 19:48:32,126 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=782066.6666666666, ans=10.0 2023-11-19 19:48:33,430 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=782066.6666666666, ans=0.125 2023-11-19 19:48:42,127 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=782133.3333333334, ans=0.125 2023-11-19 19:48:42,137 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=782133.3333333334, ans=0.1 2023-11-19 19:49:03,189 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.816e+01 8.111e+01 8.734e+01 9.488e+01 1.224e+02, threshold=1.747e+02, percent-clipped=0.0 2023-11-19 19:49:21,041 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 117350 2023-11-19 19:49:21,645 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.25 vs. limit=22.5 2023-11-19 19:49:36,007 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 9150, loss[loss=0.1003, simple_loss=0.1358, pruned_loss=0.02548, audio_tagging_loss=0.006968, over 16014.00 frames. ], tot_loss[loss=0.08446, simple_loss=0.1042, pruned_loss=0.02218, audio_tagging_loss=0.01019, over 3062484.68 frames. ], batch size: 57, lr: 6.84e-03, grad_scale: 16.0 2023-11-19 19:49:45,651 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=782400.0, ans=0.125 2023-11-19 19:50:01,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=782533.3333333334, ans=0.2 2023-11-19 19:50:26,044 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 117400 2023-11-19 19:50:34,081 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.80 vs. limit=12.0 2023-11-19 19:50:42,582 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 9200, loss[loss=0.07342, simple_loss=0.09272, pruned_loss=0.01949, audio_tagging_loss=0.00758, over 15303.00 frames. ], tot_loss[loss=0.08489, simple_loss=0.1046, pruned_loss=0.02247, audio_tagging_loss=0.01015, over 3069046.47 frames. ], batch size: 57, lr: 6.84e-03, grad_scale: 32.0 2023-11-19 19:51:00,369 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=782800.0, ans=0.125 2023-11-19 19:51:15,182 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.014e+01 8.317e+01 9.062e+01 1.049e+02 1.537e+02, threshold=1.812e+02, percent-clipped=0.0 2023-11-19 19:51:25,075 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=782933.3333333334, ans=0.2 2023-11-19 19:51:26,292 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=782933.3333333334, ans=0.1 2023-11-19 19:51:31,364 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=782933.3333333334, ans=0.125 2023-11-19 19:51:33,118 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 117450 2023-11-19 19:51:39,203 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=783000.0, ans=0.0 2023-11-19 19:51:48,865 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 9250, loss[loss=0.05059, simple_loss=0.06157, pruned_loss=0.009828, audio_tagging_loss=0.009973, over 14934.00 frames. ], tot_loss[loss=0.0848, simple_loss=0.1042, pruned_loss=0.02254, audio_tagging_loss=0.01014, over 3068041.68 frames. ], batch size: 57, lr: 6.84e-03, grad_scale: 32.0 2023-11-19 19:52:04,833 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.79 vs. limit=15.0 2023-11-19 19:52:06,244 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.09 vs. limit=22.5 2023-11-19 19:52:29,575 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.02 vs. limit=15.0 2023-11-19 19:52:32,868 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=783266.6666666666, ans=0.125 2023-11-19 19:52:38,875 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 117500 2023-11-19 19:52:46,008 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=783333.3333333334, ans=0.125 2023-11-19 19:52:54,391 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 9300, loss[loss=0.06524, simple_loss=0.08134, pruned_loss=0.01485, audio_tagging_loss=0.009719, over 14597.00 frames. ], tot_loss[loss=0.08466, simple_loss=0.1039, pruned_loss=0.02247, audio_tagging_loss=0.01026, over 3071937.72 frames. ], batch size: 56, lr: 6.84e-03, grad_scale: 32.0 2023-11-19 19:52:56,047 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=783400.0, ans=0.04949747468305833 2023-11-19 19:53:00,712 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=783400.0, ans=0.125 2023-11-19 19:53:19,793 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=783533.3333333334, ans=0.1 2023-11-19 19:53:22,463 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=783533.3333333334, ans=0.125 2023-11-19 19:53:26,981 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.639e+01 8.312e+01 9.036e+01 9.787e+01 1.162e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-19 19:53:43,003 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=783600.0, ans=0.2 2023-11-19 19:53:44,131 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 117550 2023-11-19 19:53:52,459 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=783666.6666666666, ans=0.1 2023-11-19 19:53:59,614 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 9350, loss[loss=0.0986, simple_loss=0.1184, pruned_loss=0.02921, audio_tagging_loss=0.01022, over 14069.00 frames. ], tot_loss[loss=0.08454, simple_loss=0.104, pruned_loss=0.02236, audio_tagging_loss=0.01019, over 3064060.62 frames. ], batch size: 54, lr: 6.84e-03, grad_scale: 32.0 2023-11-19 19:54:44,968 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.75 vs. limit=15.0 2023-11-19 19:54:49,464 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 117600 2023-11-19 19:55:05,801 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 9400, loss[loss=0.05269, simple_loss=0.05891, pruned_loss=0.01106, audio_tagging_loss=0.01217, over 16677.00 frames. ], tot_loss[loss=0.08406, simple_loss=0.1032, pruned_loss=0.02212, audio_tagging_loss=0.01035, over 3064525.01 frames. ], batch size: 64, lr: 6.83e-03, grad_scale: 16.0 2023-11-19 19:55:12,519 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=784066.6666666666, ans=0.0 2023-11-19 19:55:21,413 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=784133.3333333334, ans=0.125 2023-11-19 19:55:32,218 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=784200.0, ans=0.0 2023-11-19 19:55:39,191 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.194e+01 8.476e+01 9.081e+01 1.030e+02 1.355e+02, threshold=1.816e+02, percent-clipped=0.0 2023-11-19 19:55:55,273 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 117650 2023-11-19 19:55:57,912 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=784333.3333333334, ans=0.125 2023-11-19 19:56:06,421 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 19:56:11,009 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 9450, loss[loss=0.07951, simple_loss=0.1039, pruned_loss=0.01528, audio_tagging_loss=0.01227, over 16027.00 frames. ], tot_loss[loss=0.08517, simple_loss=0.1047, pruned_loss=0.02248, audio_tagging_loss=0.01035, over 3064651.87 frames. ], batch size: 58, lr: 6.83e-03, grad_scale: 16.0 2023-11-19 19:56:13,777 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=784400.0, ans=0.125 2023-11-19 19:56:17,354 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=784400.0, ans=0.125 2023-11-19 19:56:32,230 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=784466.6666666666, ans=0.0 2023-11-19 19:56:39,754 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.25 vs. limit=15.0 2023-11-19 19:56:52,939 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=784600.0, ans=0.125 2023-11-19 19:57:00,656 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 117700 2023-11-19 19:57:08,244 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=784666.6666666666, ans=0.2 2023-11-19 19:57:09,917 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.59 vs. limit=15.0 2023-11-19 19:57:16,054 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 9500, loss[loss=0.06295, simple_loss=0.07363, pruned_loss=0.01326, audio_tagging_loss=0.01287, over 15172.00 frames. ], tot_loss[loss=0.08476, simple_loss=0.1042, pruned_loss=0.02228, audio_tagging_loss=0.01038, over 3056518.54 frames. ], batch size: 56, lr: 6.83e-03, grad_scale: 16.0 2023-11-19 19:57:16,355 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=784733.3333333334, ans=0.125 2023-11-19 19:57:36,364 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=784800.0, ans=0.125 2023-11-19 19:57:49,739 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.106e+01 8.346e+01 9.084e+01 9.988e+01 1.421e+02, threshold=1.817e+02, percent-clipped=0.0 2023-11-19 19:57:57,389 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.73 vs. limit=15.0 2023-11-19 19:57:58,299 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=784933.3333333334, ans=0.125 2023-11-19 19:58:06,201 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 117750 2023-11-19 19:58:21,913 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 9550, loss[loss=0.07271, simple_loss=0.08502, pruned_loss=0.01708, audio_tagging_loss=0.01312, over 14640.00 frames. ], tot_loss[loss=0.08415, simple_loss=0.1029, pruned_loss=0.02211, audio_tagging_loss=0.01059, over 3054501.54 frames. ], batch size: 55, lr: 6.83e-03, grad_scale: 16.0 2023-11-19 19:58:32,322 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=785066.6666666666, ans=0.125 2023-11-19 19:58:46,493 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=785200.0, ans=0.125 2023-11-19 19:58:47,523 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=785200.0, ans=0.125 2023-11-19 19:58:56,694 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=785200.0, ans=0.125 2023-11-19 19:59:10,345 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=785266.6666666666, ans=0.125 2023-11-19 19:59:11,455 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 117800 2023-11-19 19:59:11,681 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=785266.6666666666, ans=0.09899494936611666 2023-11-19 19:59:26,812 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 9600, loss[loss=0.07631, simple_loss=0.1011, pruned_loss=0.0176, audio_tagging_loss=0.008145, over 15296.00 frames. ], tot_loss[loss=0.08451, simple_loss=0.1032, pruned_loss=0.02222, audio_tagging_loss=0.01068, over 3050382.50 frames. ], batch size: 57, lr: 6.83e-03, grad_scale: 32.0 2023-11-19 19:59:31,287 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.34 vs. limit=15.0 2023-11-19 19:59:43,005 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=785466.6666666666, ans=0.0 2023-11-19 19:59:43,021 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=785466.6666666666, ans=0.1 2023-11-19 19:59:51,288 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=785466.6666666666, ans=0.125 2023-11-19 19:59:55,604 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=785533.3333333334, ans=0.125 2023-11-19 19:59:56,732 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=785533.3333333334, ans=0.1 2023-11-19 20:00:01,484 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.979e+01 8.524e+01 9.148e+01 9.988e+01 1.418e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-19 20:00:01,827 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=785533.3333333334, ans=0.2 2023-11-19 20:00:06,794 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=785600.0, ans=0.2 2023-11-19 20:00:09,181 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=785600.0, ans=0.125 2023-11-19 20:00:11,795 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=785600.0, ans=0.125 2023-11-19 20:00:16,429 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 117850 2023-11-19 20:00:27,245 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.54 vs. limit=15.0 2023-11-19 20:00:31,408 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.43 vs. limit=12.0 2023-11-19 20:00:32,110 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 9650, loss[loss=0.08617, simple_loss=0.1072, pruned_loss=0.02374, audio_tagging_loss=0.008801, over 14933.00 frames. ], tot_loss[loss=0.08491, simple_loss=0.1037, pruned_loss=0.02245, audio_tagging_loss=0.01062, over 3047535.21 frames. ], batch size: 55, lr: 6.83e-03, grad_scale: 32.0 2023-11-19 20:00:32,399 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=785733.3333333334, ans=0.2 2023-11-19 20:00:49,642 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=785800.0, ans=0.0 2023-11-19 20:01:02,249 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 20:01:05,988 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 20:01:07,183 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=785866.6666666666, ans=0.125 2023-11-19 20:01:22,127 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 117900 2023-11-19 20:01:37,760 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 9700, loss[loss=0.09287, simple_loss=0.1025, pruned_loss=0.02625, audio_tagging_loss=0.01535, over 15067.00 frames. ], tot_loss[loss=0.08464, simple_loss=0.1032, pruned_loss=0.02253, audio_tagging_loss=0.01049, over 3045702.73 frames. ], batch size: 59, lr: 6.83e-03, grad_scale: 32.0 2023-11-19 20:02:11,062 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.874e+01 8.357e+01 9.241e+01 1.009e+02 1.482e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-19 20:02:13,830 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=786200.0, ans=0.1 2023-11-19 20:02:23,282 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=786266.6666666666, ans=0.025 2023-11-19 20:02:25,848 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=786266.6666666666, ans=0.125 2023-11-19 20:02:26,875 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 117950 2023-11-19 20:02:39,433 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=786333.3333333334, ans=0.1 2023-11-19 20:02:41,769 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 9750, loss[loss=0.0815, simple_loss=0.103, pruned_loss=0.02055, audio_tagging_loss=0.009458, over 15957.00 frames. ], tot_loss[loss=0.08407, simple_loss=0.1028, pruned_loss=0.02221, audio_tagging_loss=0.01044, over 3043968.28 frames. ], batch size: 59, lr: 6.82e-03, grad_scale: 32.0 2023-11-19 20:02:57,282 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=786466.6666666666, ans=0.125 2023-11-19 20:03:05,496 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=786466.6666666666, ans=0.07 2023-11-19 20:03:22,678 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.93 vs. limit=15.0 2023-11-19 20:03:28,797 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.14 vs. limit=15.0 2023-11-19 20:03:30,745 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 118000 2023-11-19 20:03:30,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=786600.0, ans=0.0 2023-11-19 20:03:33,810 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=786666.6666666666, ans=0.035 2023-11-19 20:03:45,034 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 20:03:46,739 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 9800, loss[loss=0.09273, simple_loss=0.1199, pruned_loss=0.02469, audio_tagging_loss=0.008076, over 15645.00 frames. ], tot_loss[loss=0.08449, simple_loss=0.1036, pruned_loss=0.02233, audio_tagging_loss=0.01036, over 3043410.44 frames. ], batch size: 58, lr: 6.82e-03, grad_scale: 32.0 2023-11-19 20:04:04,618 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=786800.0, ans=0.0 2023-11-19 20:04:13,247 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=786866.6666666666, ans=0.2 2023-11-19 20:04:20,504 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.556e+01 8.585e+01 9.415e+01 1.041e+02 1.362e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-19 20:04:22,214 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=786866.6666666666, ans=0.0 2023-11-19 20:04:36,293 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 118050 2023-11-19 20:04:36,366 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=786933.3333333334, ans=0.1 2023-11-19 20:04:43,148 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 20:04:49,555 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=787000.0, ans=0.0 2023-11-19 20:04:52,875 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 9850, loss[loss=0.09632, simple_loss=0.1137, pruned_loss=0.02933, audio_tagging_loss=0.01012, over 15656.00 frames. ], tot_loss[loss=0.08533, simple_loss=0.1048, pruned_loss=0.02272, audio_tagging_loss=0.0102, over 3045426.25 frames. ], batch size: 58, lr: 6.82e-03, grad_scale: 32.0 2023-11-19 20:04:55,811 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=787066.6666666666, ans=0.1 2023-11-19 20:04:59,402 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=787066.6666666666, ans=0.125 2023-11-19 20:05:06,593 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=787133.3333333334, ans=0.2 2023-11-19 20:05:11,065 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.13 vs. limit=8.0 2023-11-19 20:05:15,581 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.98 vs. limit=15.0 2023-11-19 20:05:16,584 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=787200.0, ans=0.125 2023-11-19 20:05:16,738 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=787200.0, ans=0.125 2023-11-19 20:05:18,044 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.52 vs. limit=12.0 2023-11-19 20:05:19,069 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=787200.0, ans=0.125 2023-11-19 20:05:20,429 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=787200.0, ans=0.0 2023-11-19 20:05:30,661 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=787266.6666666666, ans=0.2 2023-11-19 20:05:41,651 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 118100 2023-11-19 20:05:44,286 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=787333.3333333334, ans=0.0 2023-11-19 20:05:54,476 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=787333.3333333334, ans=0.1 2023-11-19 20:05:56,631 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 9900, loss[loss=0.06005, simple_loss=0.07302, pruned_loss=0.01324, audio_tagging_loss=0.01029, over 17005.00 frames. ], tot_loss[loss=0.0847, simple_loss=0.104, pruned_loss=0.02249, audio_tagging_loss=0.01021, over 3041671.89 frames. ], batch size: 66, lr: 6.82e-03, grad_scale: 32.0 2023-11-19 20:05:58,218 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=787400.0, ans=0.07 2023-11-19 20:06:03,421 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=787400.0, ans=0.2 2023-11-19 20:06:04,477 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=787400.0, ans=0.0 2023-11-19 20:06:24,405 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=787533.3333333334, ans=0.0 2023-11-19 20:06:30,699 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.320e+01 8.268e+01 9.019e+01 9.736e+01 1.319e+02, threshold=1.804e+02, percent-clipped=0.0 2023-11-19 20:06:45,641 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 118150 2023-11-19 20:06:51,268 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.58 vs. limit=15.0 2023-11-19 20:07:00,320 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 9950, loss[loss=0.08429, simple_loss=0.1017, pruned_loss=0.02026, audio_tagging_loss=0.01316, over 14618.00 frames. ], tot_loss[loss=0.0838, simple_loss=0.1024, pruned_loss=0.02219, audio_tagging_loss=0.01039, over 3046316.49 frames. ], batch size: 56, lr: 6.82e-03, grad_scale: 32.0 2023-11-19 20:07:13,195 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=787800.0, ans=0.05 2023-11-19 20:07:45,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=787933.3333333334, ans=0.125 2023-11-19 20:07:48,761 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 118200 2023-11-19 20:08:05,806 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 10000, loss[loss=0.1027, simple_loss=0.1485, pruned_loss=0.02088, audio_tagging_loss=0.007525, over 16411.00 frames. ], tot_loss[loss=0.08436, simple_loss=0.1037, pruned_loss=0.0222, audio_tagging_loss=0.01031, over 3046387.19 frames. ], batch size: 56, lr: 6.82e-03, grad_scale: 32.0 2023-11-19 20:08:09,684 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=788066.6666666666, ans=0.125 2023-11-19 20:08:14,778 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=788066.6666666666, ans=0.1 2023-11-19 20:08:31,289 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.32 vs. limit=10.0 2023-11-19 20:08:37,911 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.424e+01 7.892e+01 8.582e+01 9.307e+01 3.708e+02, threshold=1.716e+02, percent-clipped=1.0 2023-11-19 20:08:46,314 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=788266.6666666666, ans=0.125 2023-11-19 20:08:54,211 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 118250 2023-11-19 20:08:56,153 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=788333.3333333334, ans=0.125 2023-11-19 20:08:57,647 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=788333.3333333334, ans=0.0 2023-11-19 20:08:58,033 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.22 vs. limit=15.0 2023-11-19 20:09:09,683 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 10050, loss[loss=0.09368, simple_loss=0.1156, pruned_loss=0.02662, audio_tagging_loss=0.009253, over 14830.00 frames. ], tot_loss[loss=0.08495, simple_loss=0.1047, pruned_loss=0.02237, audio_tagging_loss=0.01023, over 3049172.72 frames. ], batch size: 56, lr: 6.82e-03, grad_scale: 32.0 2023-11-19 20:09:10,267 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.89 vs. limit=15.0 2023-11-19 20:09:21,095 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=788466.6666666666, ans=0.0 2023-11-19 20:09:30,668 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.94 vs. limit=22.5 2023-11-19 20:09:42,646 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.85 vs. limit=15.0 2023-11-19 20:09:58,770 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 118300 2023-11-19 20:10:13,513 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 10100, loss[loss=0.08673, simple_loss=0.1109, pruned_loss=0.0219, audio_tagging_loss=0.009371, over 14641.00 frames. ], tot_loss[loss=0.08541, simple_loss=0.1049, pruned_loss=0.02255, audio_tagging_loss=0.01039, over 3044663.59 frames. ], batch size: 56, lr: 6.81e-03, grad_scale: 32.0 2023-11-19 20:10:17,626 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=788733.3333333334, ans=0.0 2023-11-19 20:10:17,628 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=788733.3333333334, ans=0.1 2023-11-19 20:10:47,503 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.102e+01 8.462e+01 9.399e+01 1.049e+02 1.408e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-19 20:11:02,508 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 118350 2023-11-19 20:11:03,600 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 20:11:05,250 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=789000.0, ans=0.125 2023-11-19 20:11:07,024 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=14.63 vs. limit=15.0 2023-11-19 20:11:11,789 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=789000.0, ans=0.0 2023-11-19 20:11:12,937 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=789000.0, ans=0.2 2023-11-19 20:11:18,277 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 10150, loss[loss=0.1197, simple_loss=0.1503, pruned_loss=0.03746, audio_tagging_loss=0.007109, over 16145.00 frames. ], tot_loss[loss=0.08521, simple_loss=0.1045, pruned_loss=0.02248, audio_tagging_loss=0.01045, over 3048297.37 frames. ], batch size: 59, lr: 6.81e-03, grad_scale: 32.0 2023-11-19 20:11:26,548 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=789066.6666666666, ans=0.1 2023-11-19 20:11:27,559 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.24 vs. limit=15.0 2023-11-19 20:11:46,525 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 20:11:50,768 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.70 vs. limit=15.0 2023-11-19 20:11:58,569 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=789266.6666666666, ans=0.0 2023-11-19 20:12:03,482 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=789266.6666666666, ans=0.125 2023-11-19 20:12:07,325 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 118400 2023-11-19 20:12:07,612 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=789266.6666666666, ans=0.0 2023-11-19 20:12:14,083 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=789333.3333333334, ans=0.2 2023-11-19 20:12:23,155 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 10200, loss[loss=0.1002, simple_loss=0.1151, pruned_loss=0.03457, audio_tagging_loss=0.008027, over 15911.00 frames. ], tot_loss[loss=0.08582, simple_loss=0.1053, pruned_loss=0.02267, audio_tagging_loss=0.01051, over 3054634.76 frames. ], batch size: 59, lr: 6.81e-03, grad_scale: 32.0 2023-11-19 20:12:27,384 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=789400.0, ans=10.0 2023-11-19 20:12:42,134 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=789466.6666666666, ans=0.125 2023-11-19 20:12:44,391 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 20:12:57,891 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.646e+01 8.394e+01 8.968e+01 9.888e+01 1.322e+02, threshold=1.794e+02, percent-clipped=0.0 2023-11-19 20:13:12,265 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 118450 2023-11-19 20:13:21,143 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=789666.6666666666, ans=0.04949747468305833 2023-11-19 20:13:22,344 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 20:13:23,612 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=789666.6666666666, ans=0.0 2023-11-19 20:13:27,007 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 10250, loss[loss=0.1197, simple_loss=0.1509, pruned_loss=0.03552, audio_tagging_loss=0.00877, over 16270.00 frames. ], tot_loss[loss=0.08552, simple_loss=0.1051, pruned_loss=0.02252, audio_tagging_loss=0.01047, over 3056747.62 frames. ], batch size: 58, lr: 6.81e-03, grad_scale: 16.0 2023-11-19 20:13:27,590 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.59 vs. limit=6.0 2023-11-19 20:13:38,031 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.26 vs. limit=15.0 2023-11-19 20:13:58,783 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=789866.6666666666, ans=0.2 2023-11-19 20:14:00,599 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=789866.6666666666, ans=0.1 2023-11-19 20:14:05,611 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=789933.3333333334, ans=0.125 2023-11-19 20:14:16,534 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 118500 2023-11-19 20:14:31,740 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 10300, loss[loss=0.07482, simple_loss=0.08976, pruned_loss=0.01912, audio_tagging_loss=0.01082, over 16714.00 frames. ], tot_loss[loss=0.08542, simple_loss=0.1047, pruned_loss=0.02252, audio_tagging_loss=0.01053, over 3056978.36 frames. ], batch size: 61, lr: 6.81e-03, grad_scale: 16.0 2023-11-19 20:14:31,973 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=790066.6666666666, ans=0.0 2023-11-19 20:14:45,693 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=790133.3333333334, ans=0.2 2023-11-19 20:15:06,827 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.556e+01 8.163e+01 8.831e+01 9.880e+01 1.200e+02, threshold=1.766e+02, percent-clipped=0.0 2023-11-19 20:15:12,169 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=790266.6666666666, ans=0.1 2023-11-19 20:15:21,129 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 118550 2023-11-19 20:15:37,170 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 10350, loss[loss=0.08163, simple_loss=0.1126, pruned_loss=0.01526, audio_tagging_loss=0.01005, over 15800.00 frames. ], tot_loss[loss=0.08486, simple_loss=0.1041, pruned_loss=0.0222, audio_tagging_loss=0.0106, over 3055625.18 frames. ], batch size: 57, lr: 6.81e-03, grad_scale: 16.0 2023-11-19 20:15:37,456 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=790400.0, ans=0.0 2023-11-19 20:15:42,014 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=790400.0, ans=0.125 2023-11-19 20:15:49,122 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=790466.6666666666, ans=0.2 2023-11-19 20:15:55,718 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.87 vs. limit=22.5 2023-11-19 20:16:15,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=790600.0, ans=0.125 2023-11-19 20:16:22,510 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=790600.0, ans=0.0 2023-11-19 20:16:25,944 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 118600 2023-11-19 20:16:26,266 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=790600.0, ans=0.125 2023-11-19 20:16:39,601 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=790666.6666666666, ans=0.0 2023-11-19 20:16:41,683 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 10400, loss[loss=0.07927, simple_loss=0.1062, pruned_loss=0.01828, audio_tagging_loss=0.007905, over 14563.00 frames. ], tot_loss[loss=0.0848, simple_loss=0.1036, pruned_loss=0.02224, audio_tagging_loss=0.01074, over 3054963.23 frames. ], batch size: 52, lr: 6.81e-03, grad_scale: 32.0 2023-11-19 20:17:00,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=790800.0, ans=0.0 2023-11-19 20:17:01,427 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=790800.0, ans=0.125 2023-11-19 20:17:17,251 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.085e+01 8.510e+01 9.248e+01 1.057e+02 2.087e+02, threshold=1.850e+02, percent-clipped=1.0 2023-11-19 20:17:22,682 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=790933.3333333334, ans=0.125 2023-11-19 20:17:31,965 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 118650 2023-11-19 20:17:34,941 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.85 vs. limit=12.0 2023-11-19 20:17:47,263 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 10450, loss[loss=0.08153, simple_loss=0.1057, pruned_loss=0.01733, audio_tagging_loss=0.01133, over 15481.00 frames. ], tot_loss[loss=0.08476, simple_loss=0.1039, pruned_loss=0.0221, audio_tagging_loss=0.01069, over 3050113.84 frames. ], batch size: 56, lr: 6.80e-03, grad_scale: 16.0 2023-11-19 20:18:21,411 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=791200.0, ans=0.0 2023-11-19 20:18:23,873 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=791200.0, ans=0.1 2023-11-19 20:18:36,691 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 118700 2023-11-19 20:18:47,950 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=791333.3333333334, ans=0.125 2023-11-19 20:18:52,556 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 10500, loss[loss=0.05705, simple_loss=0.06801, pruned_loss=0.01357, audio_tagging_loss=0.009471, over 16753.00 frames. ], tot_loss[loss=0.08409, simple_loss=0.1032, pruned_loss=0.02198, audio_tagging_loss=0.01049, over 3052585.96 frames. ], batch size: 69, lr: 6.80e-03, grad_scale: 16.0 2023-11-19 20:19:10,836 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=791466.6666666666, ans=0.125 2023-11-19 20:19:12,233 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.63 vs. limit=15.0 2023-11-19 20:19:28,551 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.886e+01 8.049e+01 8.549e+01 9.577e+01 1.136e+02, threshold=1.710e+02, percent-clipped=0.0 2023-11-19 20:19:42,052 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 118750 2023-11-19 20:19:56,803 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 10550, loss[loss=0.08704, simple_loss=0.1047, pruned_loss=0.02484, audio_tagging_loss=0.00987, over 14168.00 frames. ], tot_loss[loss=0.08448, simple_loss=0.104, pruned_loss=0.02211, audio_tagging_loss=0.01038, over 3049717.37 frames. ], batch size: 54, lr: 6.80e-03, grad_scale: 16.0 2023-11-19 20:20:19,721 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=791800.0, ans=15.0 2023-11-19 20:20:21,255 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.17 vs. limit=6.0 2023-11-19 20:20:34,066 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=791866.6666666666, ans=0.2 2023-11-19 20:20:46,406 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 118800 2023-11-19 20:20:53,630 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=792000.0, ans=0.0 2023-11-19 20:21:00,247 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=792000.0, ans=0.0 2023-11-19 20:21:02,565 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 10600, loss[loss=0.08582, simple_loss=0.1113, pruned_loss=0.02074, audio_tagging_loss=0.009428, over 15275.00 frames. ], tot_loss[loss=0.08471, simple_loss=0.1047, pruned_loss=0.02214, audio_tagging_loss=0.01022, over 3041943.77 frames. ], batch size: 59, lr: 6.80e-03, grad_scale: 16.0 2023-11-19 20:21:20,149 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=792133.3333333334, ans=0.125 2023-11-19 20:21:23,883 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=792133.3333333334, ans=0.2 2023-11-19 20:21:37,294 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=792200.0, ans=0.0 2023-11-19 20:21:37,451 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=792200.0, ans=0.1 2023-11-19 20:21:38,216 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.354e+01 8.305e+01 8.737e+01 9.475e+01 2.195e+02, threshold=1.747e+02, percent-clipped=1.0 2023-11-19 20:21:41,672 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=792266.6666666666, ans=0.0 2023-11-19 20:21:51,985 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 118850 2023-11-19 20:22:07,730 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 10650, loss[loss=0.06424, simple_loss=0.07796, pruned_loss=0.0151, audio_tagging_loss=0.01016, over 15985.00 frames. ], tot_loss[loss=0.08507, simple_loss=0.1049, pruned_loss=0.0224, audio_tagging_loss=0.01021, over 3045958.94 frames. ], batch size: 62, lr: 6.80e-03, grad_scale: 16.0 2023-11-19 20:22:38,657 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=792533.3333333334, ans=0.125 2023-11-19 20:22:42,542 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=792533.3333333334, ans=0.125 2023-11-19 20:22:53,821 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=792600.0, ans=0.125 2023-11-19 20:22:57,333 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 118900 2023-11-19 20:22:58,655 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=792666.6666666666, ans=0.0 2023-11-19 20:23:01,103 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=792666.6666666666, ans=0.0 2023-11-19 20:23:01,195 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=792666.6666666666, ans=0.07 2023-11-19 20:23:02,423 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=792666.6666666666, ans=0.2 2023-11-19 20:23:02,459 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=792666.6666666666, ans=0.125 2023-11-19 20:23:12,199 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 10700, loss[loss=0.06941, simple_loss=0.08906, pruned_loss=0.01542, audio_tagging_loss=0.009461, over 15124.00 frames. ], tot_loss[loss=0.0843, simple_loss=0.1039, pruned_loss=0.0222, audio_tagging_loss=0.01014, over 3042756.45 frames. ], batch size: 56, lr: 6.80e-03, grad_scale: 16.0 2023-11-19 20:23:48,371 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.590e+01 8.344e+01 9.124e+01 9.874e+01 1.194e+02, threshold=1.825e+02, percent-clipped=0.0 2023-11-19 20:24:00,576 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 118950 2023-11-19 20:24:16,651 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 10750, loss[loss=0.09327, simple_loss=0.1115, pruned_loss=0.02621, audio_tagging_loss=0.0113, over 15359.00 frames. ], tot_loss[loss=0.08458, simple_loss=0.1044, pruned_loss=0.02226, audio_tagging_loss=0.01012, over 3044615.62 frames. ], batch size: 58, lr: 6.80e-03, grad_scale: 16.0 2023-11-19 20:24:24,273 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=793066.6666666666, ans=0.1 2023-11-19 20:24:32,551 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=793133.3333333334, ans=0.2 2023-11-19 20:24:35,147 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=793133.3333333334, ans=0.125 2023-11-19 20:24:46,061 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=793200.0, ans=0.0 2023-11-19 20:25:04,903 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 119000 2023-11-19 20:25:07,695 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=793333.3333333334, ans=0.125 2023-11-19 20:25:21,033 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 10800, loss[loss=0.07258, simple_loss=0.09105, pruned_loss=0.0166, audio_tagging_loss=0.01045, over 14287.00 frames. ], tot_loss[loss=0.08421, simple_loss=0.1043, pruned_loss=0.02194, audio_tagging_loss=0.01011, over 3056856.93 frames. ], batch size: 54, lr: 6.79e-03, grad_scale: 32.0 2023-11-19 20:25:36,375 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.85 vs. limit=15.0 2023-11-19 20:25:46,201 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.71 vs. limit=15.0 2023-11-19 20:25:56,697 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.855e+01 8.324e+01 9.413e+01 1.037e+02 1.353e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-19 20:26:01,200 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.86 vs. limit=22.5 2023-11-19 20:26:10,133 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 119050 2023-11-19 20:26:23,939 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=793733.3333333334, ans=0.0 2023-11-19 20:26:24,869 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 10850, loss[loss=0.06919, simple_loss=0.08218, pruned_loss=0.0167, audio_tagging_loss=0.0114, over 14708.00 frames. ], tot_loss[loss=0.08423, simple_loss=0.1041, pruned_loss=0.02198, audio_tagging_loss=0.01019, over 3052393.39 frames. ], batch size: 56, lr: 6.79e-03, grad_scale: 32.0 2023-11-19 20:26:27,537 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=793733.3333333334, ans=0.125 2023-11-19 20:26:38,621 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=793800.0, ans=0.0 2023-11-19 20:27:13,719 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 119100 2023-11-19 20:27:16,467 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=794000.0, ans=0.07 2023-11-19 20:27:20,063 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=794000.0, ans=0.125 2023-11-19 20:27:22,301 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 20:27:28,431 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 10900, loss[loss=0.07571, simple_loss=0.08816, pruned_loss=0.01765, audio_tagging_loss=0.01398, over 15599.00 frames. ], tot_loss[loss=0.08436, simple_loss=0.1041, pruned_loss=0.02203, audio_tagging_loss=0.01028, over 3052739.47 frames. ], batch size: 60, lr: 6.79e-03, grad_scale: 16.0 2023-11-19 20:27:42,368 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=794133.3333333334, ans=0.1 2023-11-19 20:27:55,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=794200.0, ans=0.0 2023-11-19 20:28:05,856 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.516e+01 8.198e+01 8.697e+01 9.317e+01 1.364e+02, threshold=1.739e+02, percent-clipped=0.0 2023-11-19 20:28:11,228 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=794266.6666666666, ans=0.125 2023-11-19 20:28:16,994 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 119150 2023-11-19 20:28:17,292 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=794266.6666666666, ans=0.125 2023-11-19 20:28:23,931 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=794333.3333333334, ans=0.125 2023-11-19 20:28:33,890 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 10950, loss[loss=0.09625, simple_loss=0.1174, pruned_loss=0.02748, audio_tagging_loss=0.01005, over 14641.00 frames. ], tot_loss[loss=0.08421, simple_loss=0.104, pruned_loss=0.02196, audio_tagging_loss=0.01027, over 3051273.18 frames. ], batch size: 55, lr: 6.79e-03, grad_scale: 16.0 2023-11-19 20:28:34,590 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.94 vs. limit=15.0 2023-11-19 20:28:35,414 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=794400.0, ans=0.0 2023-11-19 20:28:41,473 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=794400.0, ans=0.0 2023-11-19 20:28:51,515 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=794466.6666666666, ans=0.0 2023-11-19 20:28:56,265 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=794466.6666666666, ans=0.2 2023-11-19 20:28:58,071 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.57 vs. limit=15.0 2023-11-19 20:29:05,039 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=794533.3333333334, ans=0.125 2023-11-19 20:29:10,539 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=794600.0, ans=0.0 2023-11-19 20:29:23,128 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 119200 2023-11-19 20:29:26,025 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=794666.6666666666, ans=0.125 2023-11-19 20:29:37,873 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 11000, loss[loss=0.115, simple_loss=0.1559, pruned_loss=0.0299, audio_tagging_loss=0.007115, over 14700.00 frames. ], tot_loss[loss=0.08512, simple_loss=0.1055, pruned_loss=0.02221, audio_tagging_loss=0.01017, over 3052017.48 frames. ], batch size: 53, lr: 6.79e-03, grad_scale: 16.0 2023-11-19 20:29:39,521 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=794733.3333333334, ans=0.0 2023-11-19 20:29:46,464 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 20:29:51,461 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=794800.0, ans=0.125 2023-11-19 20:29:54,078 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=794800.0, ans=0.0 2023-11-19 20:29:59,693 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=794800.0, ans=0.125 2023-11-19 20:30:12,604 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=794866.6666666666, ans=0.1 2023-11-19 20:30:15,771 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.738e+01 8.172e+01 9.020e+01 9.721e+01 1.365e+02, threshold=1.804e+02, percent-clipped=0.0 2023-11-19 20:30:27,061 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 119250 2023-11-19 20:30:41,681 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 11050, loss[loss=0.09723, simple_loss=0.1141, pruned_loss=0.02869, audio_tagging_loss=0.0115, over 16036.00 frames. ], tot_loss[loss=0.08555, simple_loss=0.1055, pruned_loss=0.02251, audio_tagging_loss=0.01028, over 3053886.29 frames. ], batch size: 59, lr: 6.79e-03, grad_scale: 16.0 2023-11-19 20:30:55,798 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=795133.3333333334, ans=0.125 2023-11-19 20:31:21,445 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=795266.6666666666, ans=0.0 2023-11-19 20:31:29,805 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 119300 2023-11-19 20:31:29,974 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=795266.6666666666, ans=0.0 2023-11-19 20:31:30,069 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=795266.6666666666, ans=0.07 2023-11-19 20:31:44,865 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 11100, loss[loss=0.09059, simple_loss=0.113, pruned_loss=0.02363, audio_tagging_loss=0.01045, over 14989.00 frames. ], tot_loss[loss=0.08581, simple_loss=0.1057, pruned_loss=0.02258, audio_tagging_loss=0.01039, over 3061387.62 frames. ], batch size: 56, lr: 6.79e-03, grad_scale: 16.0 2023-11-19 20:31:45,284 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=795400.0, ans=0.0 2023-11-19 20:31:48,692 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=795400.0, ans=0.125 2023-11-19 20:31:48,833 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=795400.0, ans=0.0 2023-11-19 20:31:50,008 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=795400.0, ans=0.125 2023-11-19 20:32:21,500 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.097e+01 8.553e+01 9.407e+01 1.079e+02 1.400e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-19 20:32:33,125 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 119350 2023-11-19 20:32:37,762 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=795666.6666666666, ans=0.2 2023-11-19 20:32:49,462 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 11150, loss[loss=0.08995, simple_loss=0.1063, pruned_loss=0.02553, audio_tagging_loss=0.0113, over 15820.00 frames. ], tot_loss[loss=0.08645, simple_loss=0.1062, pruned_loss=0.02284, audio_tagging_loss=0.0105, over 3059737.48 frames. ], batch size: 65, lr: 6.78e-03, grad_scale: 16.0 2023-11-19 20:33:09,235 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.16 vs. limit=15.0 2023-11-19 20:33:21,819 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.81 vs. limit=15.0 2023-11-19 20:33:26,257 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=795933.3333333334, ans=0.1 2023-11-19 20:33:37,737 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 119400 2023-11-19 20:33:46,750 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=796000.0, ans=0.1 2023-11-19 20:33:52,416 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 11200, loss[loss=0.07163, simple_loss=0.08834, pruned_loss=0.01549, audio_tagging_loss=0.01197, over 15070.00 frames. ], tot_loss[loss=0.08533, simple_loss=0.1046, pruned_loss=0.02242, audio_tagging_loss=0.01061, over 3056581.72 frames. ], batch size: 57, lr: 6.78e-03, grad_scale: 32.0 2023-11-19 20:33:55,401 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.98 vs. limit=12.0 2023-11-19 20:34:08,135 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=796133.3333333334, ans=0.125 2023-11-19 20:34:14,222 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=796133.3333333334, ans=0.125 2023-11-19 20:34:19,751 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=796200.0, ans=0.125 2023-11-19 20:34:29,499 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=796200.0, ans=0.125 2023-11-19 20:34:30,415 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.715e+01 8.063e+01 8.813e+01 9.513e+01 1.484e+02, threshold=1.763e+02, percent-clipped=0.0 2023-11-19 20:34:41,325 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 119450 2023-11-19 20:34:45,164 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=796333.3333333334, ans=0.2 2023-11-19 20:34:53,045 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=796333.3333333334, ans=0.125 2023-11-19 20:34:56,372 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 11250, loss[loss=0.08571, simple_loss=0.1085, pruned_loss=0.02256, audio_tagging_loss=0.008915, over 15189.00 frames. ], tot_loss[loss=0.08523, simple_loss=0.1044, pruned_loss=0.02243, audio_tagging_loss=0.01059, over 3056946.24 frames. ], batch size: 55, lr: 6.78e-03, grad_scale: 32.0 2023-11-19 20:35:00,917 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=796400.0, ans=0.0 2023-11-19 20:35:38,370 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=796600.0, ans=0.125 2023-11-19 20:35:39,653 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=796600.0, ans=0.125 2023-11-19 20:35:45,345 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 119500 2023-11-19 20:36:01,669 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 11300, loss[loss=0.07262, simple_loss=0.09103, pruned_loss=0.01757, audio_tagging_loss=0.009539, over 15088.00 frames. ], tot_loss[loss=0.08575, simple_loss=0.1053, pruned_loss=0.02272, audio_tagging_loss=0.01036, over 3055562.84 frames. ], batch size: 58, lr: 6.78e-03, grad_scale: 32.0 2023-11-19 20:36:02,091 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=796733.3333333334, ans=0.025 2023-11-19 20:36:09,858 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.56 vs. limit=15.0 2023-11-19 20:36:27,094 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=796866.6666666666, ans=0.5 2023-11-19 20:36:28,801 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.45 vs. limit=22.5 2023-11-19 20:36:38,561 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.085e+01 8.176e+01 8.864e+01 9.663e+01 1.255e+02, threshold=1.773e+02, percent-clipped=0.0 2023-11-19 20:36:50,750 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 119550 2023-11-19 20:37:05,293 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 11350, loss[loss=0.1118, simple_loss=0.1378, pruned_loss=0.03582, audio_tagging_loss=0.007107, over 15299.00 frames. ], tot_loss[loss=0.08617, simple_loss=0.106, pruned_loss=0.02295, audio_tagging_loss=0.01022, over 3054260.72 frames. ], batch size: 57, lr: 6.78e-03, grad_scale: 32.0 2023-11-19 20:37:18,073 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=797133.3333333334, ans=0.0 2023-11-19 20:37:19,372 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=797133.3333333334, ans=0.125 2023-11-19 20:37:33,921 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 20:37:35,099 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=797200.0, ans=0.125 2023-11-19 20:37:54,014 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 119600 2023-11-19 20:37:56,992 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=797333.3333333334, ans=0.1 2023-11-19 20:38:06,433 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.25 vs. limit=15.0 2023-11-19 20:38:09,485 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 11400, loss[loss=0.09667, simple_loss=0.1208, pruned_loss=0.02733, audio_tagging_loss=0.008921, over 14686.00 frames. ], tot_loss[loss=0.08523, simple_loss=0.1049, pruned_loss=0.02252, audio_tagging_loss=0.01028, over 3054962.35 frames. ], batch size: 53, lr: 6.78e-03, grad_scale: 32.0 2023-11-19 20:38:12,200 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=797400.0, ans=0.125 2023-11-19 20:38:30,981 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=797466.6666666666, ans=0.125 2023-11-19 20:38:46,565 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.427e+01 8.335e+01 9.100e+01 1.007e+02 1.269e+02, threshold=1.820e+02, percent-clipped=0.0 2023-11-19 20:38:47,010 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 20:38:49,225 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=797600.0, ans=0.125 2023-11-19 20:38:52,262 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.53 vs. limit=15.0 2023-11-19 20:38:58,336 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 119650 2023-11-19 20:39:02,474 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=797666.6666666666, ans=0.2 2023-11-19 20:39:14,311 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 11450, loss[loss=0.07314, simple_loss=0.08891, pruned_loss=0.01512, audio_tagging_loss=0.01357, over 14602.00 frames. ], tot_loss[loss=0.08494, simple_loss=0.1045, pruned_loss=0.02238, audio_tagging_loss=0.01031, over 3052863.78 frames. ], batch size: 55, lr: 6.78e-03, grad_scale: 32.0 2023-11-19 20:39:33,794 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.32 vs. limit=15.0 2023-11-19 20:40:03,129 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 119700 2023-11-19 20:40:09,995 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=798000.0, ans=0.0 2023-11-19 20:40:10,121 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=798000.0, ans=0.1 2023-11-19 20:40:10,130 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=798000.0, ans=0.0 2023-11-19 20:40:15,867 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=798000.0, ans=0.5 2023-11-19 20:40:18,258 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 11500, loss[loss=0.07318, simple_loss=0.09046, pruned_loss=0.017, audio_tagging_loss=0.01096, over 15232.00 frames. ], tot_loss[loss=0.08442, simple_loss=0.1037, pruned_loss=0.02214, audio_tagging_loss=0.01043, over 3050619.21 frames. ], batch size: 59, lr: 6.77e-03, grad_scale: 32.0 2023-11-19 20:40:18,724 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=798066.6666666666, ans=0.125 2023-11-19 20:40:22,214 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=798066.6666666666, ans=0.125 2023-11-19 20:40:28,485 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=798066.6666666666, ans=0.0 2023-11-19 20:40:35,044 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=798133.3333333334, ans=0.0 2023-11-19 20:40:37,594 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=798133.3333333334, ans=0.1 2023-11-19 20:40:38,696 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=798133.3333333334, ans=0.125 2023-11-19 20:40:39,942 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=798133.3333333334, ans=0.125 2023-11-19 20:40:55,534 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.898e+01 8.252e+01 9.046e+01 9.695e+01 1.384e+02, threshold=1.809e+02, percent-clipped=0.0 2023-11-19 20:41:07,291 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 119750 2023-11-19 20:41:08,775 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=798333.3333333334, ans=0.2 2023-11-19 20:41:12,835 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.87 vs. limit=22.5 2023-11-19 20:41:22,484 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 11550, loss[loss=0.1229, simple_loss=0.1402, pruned_loss=0.04233, audio_tagging_loss=0.01049, over 15492.00 frames. ], tot_loss[loss=0.08548, simple_loss=0.1052, pruned_loss=0.0225, audio_tagging_loss=0.0104, over 3051523.26 frames. ], batch size: 58, lr: 6.77e-03, grad_scale: 32.0 2023-11-19 20:41:31,869 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=798400.0, ans=0.0 2023-11-19 20:41:36,224 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=798466.6666666666, ans=0.1 2023-11-19 20:41:37,419 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=798466.6666666666, ans=0.2 2023-11-19 20:41:58,360 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 20:42:02,275 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=798600.0, ans=0.125 2023-11-19 20:42:05,074 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.75 vs. limit=15.0 2023-11-19 20:42:10,239 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=798600.0, ans=0.125 2023-11-19 20:42:11,173 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 119800 2023-11-19 20:42:11,407 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=798600.0, ans=0.125 2023-11-19 20:42:27,331 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 11600, loss[loss=0.09115, simple_loss=0.1185, pruned_loss=0.02504, audio_tagging_loss=0.006868, over 16894.00 frames. ], tot_loss[loss=0.08573, simple_loss=0.1058, pruned_loss=0.02255, audio_tagging_loss=0.01029, over 3055310.62 frames. ], batch size: 61, lr: 6.77e-03, grad_scale: 32.0 2023-11-19 20:42:41,953 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.81 vs. limit=15.0 2023-11-19 20:42:46,016 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.81 vs. limit=22.5 2023-11-19 20:43:04,744 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.840e+01 8.658e+01 9.480e+01 1.096e+02 1.560e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-19 20:43:14,006 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.99 vs. limit=15.0 2023-11-19 20:43:14,170 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.67 vs. limit=12.0 2023-11-19 20:43:15,991 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 119850 2023-11-19 20:43:31,177 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 11650, loss[loss=0.08488, simple_loss=0.1003, pruned_loss=0.0212, audio_tagging_loss=0.01352, over 15242.00 frames. ], tot_loss[loss=0.08616, simple_loss=0.1064, pruned_loss=0.02265, audio_tagging_loss=0.01031, over 3050310.91 frames. ], batch size: 57, lr: 6.77e-03, grad_scale: 32.0 2023-11-19 20:43:37,558 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=799066.6666666666, ans=0.125 2023-11-19 20:43:41,059 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=799066.6666666666, ans=0.0 2023-11-19 20:43:46,836 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=799133.3333333334, ans=0.0 2023-11-19 20:43:58,710 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=799200.0, ans=0.0 2023-11-19 20:44:07,843 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=799200.0, ans=0.2 2023-11-19 20:44:14,067 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=799266.6666666666, ans=0.0 2023-11-19 20:44:19,888 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 119900 2023-11-19 20:44:35,532 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 11700, loss[loss=0.08038, simple_loss=0.09229, pruned_loss=0.01993, audio_tagging_loss=0.01429, over 14868.00 frames. ], tot_loss[loss=0.08559, simple_loss=0.1055, pruned_loss=0.02248, audio_tagging_loss=0.01037, over 3052700.34 frames. ], batch size: 56, lr: 6.77e-03, grad_scale: 32.0 2023-11-19 20:44:42,350 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.54 vs. limit=15.0 2023-11-19 20:44:57,426 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.10 vs. limit=15.0 2023-11-19 20:45:14,280 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.866e+01 8.190e+01 9.042e+01 9.907e+01 1.390e+02, threshold=1.808e+02, percent-clipped=0.0 2023-11-19 20:45:14,648 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=799600.0, ans=0.125 2023-11-19 20:45:20,731 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=799600.0, ans=0.125 2023-11-19 20:45:23,620 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=799600.0, ans=0.0 2023-11-19 20:45:24,733 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 119950 2023-11-19 20:45:40,593 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 11750, loss[loss=0.08488, simple_loss=0.1017, pruned_loss=0.02283, audio_tagging_loss=0.01117, over 16836.00 frames. ], tot_loss[loss=0.08632, simple_loss=0.1065, pruned_loss=0.02278, audio_tagging_loss=0.01031, over 3052277.21 frames. ], batch size: 63, lr: 6.77e-03, grad_scale: 32.0 2023-11-19 20:46:02,326 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=799800.0, ans=0.125 2023-11-19 20:46:10,210 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.34 vs. limit=15.0 2023-11-19 20:46:12,885 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=799866.6666666666, ans=0.2 2023-11-19 20:46:29,882 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 120000 2023-11-19 20:46:31,380 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-120000.pt 2023-11-19 20:46:47,859 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 11800, loss[loss=0.08074, simple_loss=0.09827, pruned_loss=0.02335, audio_tagging_loss=0.008259, over 16229.00 frames. ], tot_loss[loss=0.08598, simple_loss=0.106, pruned_loss=0.02266, audio_tagging_loss=0.01032, over 3047866.17 frames. ], batch size: 60, lr: 6.77e-03, grad_scale: 32.0 2023-11-19 20:47:02,718 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.66 vs. limit=15.0 2023-11-19 20:47:19,819 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=6.577e-01 2023-11-19 20:47:26,903 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.227e+01 8.456e+01 9.093e+01 9.839e+01 1.192e+02, threshold=1.819e+02, percent-clipped=0.0 2023-11-19 20:47:36,806 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 120050 2023-11-19 20:47:38,293 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=800333.3333333334, ans=0.1 2023-11-19 20:47:39,363 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=800333.3333333334, ans=0.2 2023-11-19 20:47:40,794 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 20:47:47,244 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=800333.3333333334, ans=0.0 2023-11-19 20:47:51,930 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 11850, loss[loss=0.0995, simple_loss=0.1188, pruned_loss=0.02974, audio_tagging_loss=0.01037, over 15278.00 frames. ], tot_loss[loss=0.08555, simple_loss=0.1052, pruned_loss=0.02251, audio_tagging_loss=0.01045, over 3047675.24 frames. ], batch size: 58, lr: 6.76e-03, grad_scale: 32.0 2023-11-19 20:48:16,166 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=800466.6666666666, ans=0.125 2023-11-19 20:48:22,966 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.85 vs. limit=15.0 2023-11-19 20:48:36,409 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=800600.0, ans=0.1 2023-11-19 20:48:36,526 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=800600.0, ans=0.125 2023-11-19 20:48:40,948 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 120100 2023-11-19 20:48:57,227 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 11900, loss[loss=0.06836, simple_loss=0.08033, pruned_loss=0.01768, audio_tagging_loss=0.01051, over 14307.00 frames. ], tot_loss[loss=0.08491, simple_loss=0.1043, pruned_loss=0.02213, audio_tagging_loss=0.01061, over 3053412.90 frames. ], batch size: 54, lr: 6.76e-03, grad_scale: 32.0 2023-11-19 20:49:04,771 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=800733.3333333334, ans=0.0 2023-11-19 20:49:17,218 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.05 vs. limit=10.0 2023-11-19 20:49:35,957 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.865e+01 8.251e+01 9.063e+01 9.820e+01 1.973e+02, threshold=1.813e+02, percent-clipped=1.0 2023-11-19 20:49:40,085 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=800933.3333333334, ans=0.0 2023-11-19 20:49:45,806 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 120150 2023-11-19 20:50:00,548 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 11950, loss[loss=0.06579, simple_loss=0.08502, pruned_loss=0.01319, audio_tagging_loss=0.0101, over 15618.00 frames. ], tot_loss[loss=0.08462, simple_loss=0.1039, pruned_loss=0.02198, audio_tagging_loss=0.01069, over 3054240.20 frames. ], batch size: 57, lr: 6.76e-03, grad_scale: 16.0 2023-11-19 20:50:37,137 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.76 vs. limit=12.0 2023-11-19 20:50:40,773 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=801266.6666666666, ans=0.0 2023-11-19 20:50:41,868 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=801266.6666666666, ans=0.125 2023-11-19 20:50:48,465 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 120200 2023-11-19 20:51:02,555 INFO [train_asr.py:1262] (0/4) Epoch 10, batch 12000, loss[loss=0.06367, simple_loss=0.06809, pruned_loss=0.01687, audio_tagging_loss=0.01276, over 15984.00 frames. ], tot_loss[loss=0.08482, simple_loss=0.1039, pruned_loss=0.02217, audio_tagging_loss=0.0107, over 3050733.38 frames. ], batch size: 62, lr: 6.76e-03, grad_scale: 32.0 2023-11-19 20:51:02,596 INFO [train_asr.py:1285] (0/4) Computing validation loss 2023-11-19 20:51:31,492 INFO [zipformer.py:1873] (0/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.6560, 3.6164, 3.7296, 3.3223], device='cuda:0') 2023-11-19 20:51:41,793 INFO [train_asr.py:1294] (0/4) Epoch 10, validation: loss=0.06456, simple_loss=0.05518, pruned_loss=0.006322, audio_tagging_loss=0.03065, over 4681554.00 frames. 2023-11-19 20:51:41,793 INFO [train_asr.py:1295] (0/4) Maximum memory allocated so far is 25925MB 2023-11-19 20:51:46,591 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=801400.0, ans=0.0 2023-11-19 20:51:56,437 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=801466.6666666666, ans=15.0 2023-11-19 20:51:57,248 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=801466.6666666666, ans=0.2 2023-11-19 20:52:07,448 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/epoch-10.pt 2023-11-19 20:52:44,649 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 0, loss[loss=0.1026, simple_loss=0.1036, pruned_loss=0.02163, audio_tagging_loss=0.02916, over 13799.00 frames. ], tot_loss[loss=0.1026, simple_loss=0.1036, pruned_loss=0.02163, audio_tagging_loss=0.02916, over 13799.00 frames. ], batch size: 57, lr: 6.45e-03, grad_scale: 32.0 2023-11-19 20:52:44,652 INFO [train_asr.py:1285] (0/4) Computing validation loss 2023-11-19 20:53:07,307 INFO [zipformer.py:1873] (0/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.5407, 3.6566, 4.3332, 3.2384], device='cuda:0') 2023-11-19 20:53:20,002 INFO [train_asr.py:1294] (0/4) Epoch 11, validation: loss=0.06409, simple_loss=0.05518, pruned_loss=0.006264, audio_tagging_loss=0.03024, over 4681554.00 frames. 2023-11-19 20:53:20,003 INFO [train_asr.py:1295] (0/4) Maximum memory allocated so far is 25925MB 2023-11-19 20:53:31,408 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=801606.6666666666, ans=0.1 2023-11-19 20:53:32,239 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.455e+01 8.493e+01 9.059e+01 9.664e+01 1.642e+02, threshold=1.812e+02, percent-clipped=0.0 2023-11-19 20:53:41,116 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 120250 2023-11-19 20:53:51,206 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 20:53:51,264 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=801673.3333333334, ans=0.125 2023-11-19 20:53:52,952 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.66 vs. limit=22.5 2023-11-19 20:53:56,320 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=801673.3333333334, ans=0.0 2023-11-19 20:54:17,188 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=801806.6666666666, ans=0.125 2023-11-19 20:54:24,296 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 50, loss[loss=0.1104, simple_loss=0.1359, pruned_loss=0.02685, audio_tagging_loss=0.0156, over 15898.00 frames. ], tot_loss[loss=0.09181, simple_loss=0.1003, pruned_loss=0.02072, audio_tagging_loss=0.02095, over 690125.79 frames. ], batch size: 55, lr: 6.45e-03, grad_scale: 32.0 2023-11-19 20:54:30,650 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=801873.3333333334, ans=0.125 2023-11-19 20:54:34,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=801873.3333333334, ans=0.125 2023-11-19 20:54:45,934 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.06 vs. limit=15.0 2023-11-19 20:54:46,563 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 120300 2023-11-19 20:55:00,990 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=802006.6666666666, ans=0.02 2023-11-19 20:55:00,999 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=802006.6666666666, ans=0.125 2023-11-19 20:55:10,490 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.79 vs. limit=15.0 2023-11-19 20:55:17,466 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=802140.0, ans=0.2 2023-11-19 20:55:30,009 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 100, loss[loss=0.06588, simple_loss=0.07523, pruned_loss=0.01064, audio_tagging_loss=0.01763, over 15116.00 frames. ], tot_loss[loss=0.08985, simple_loss=0.0988, pruned_loss=0.02069, audio_tagging_loss=0.01976, over 1211976.09 frames. ], batch size: 57, lr: 6.45e-03, grad_scale: 32.0 2023-11-19 20:55:44,021 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.071e+01 8.908e+01 9.605e+01 1.032e+02 1.207e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-19 20:55:52,806 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 120350 2023-11-19 20:56:00,361 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=802340.0, ans=0.2 2023-11-19 20:56:19,256 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=802406.6666666666, ans=0.0 2023-11-19 20:56:35,848 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 150, loss[loss=0.0789, simple_loss=0.09303, pruned_loss=0.0175, audio_tagging_loss=0.01489, over 15616.00 frames. ], tot_loss[loss=0.08897, simple_loss=0.1005, pruned_loss=0.02105, audio_tagging_loss=0.01765, over 1618755.06 frames. ], batch size: 60, lr: 6.45e-03, grad_scale: 32.0 2023-11-19 20:56:41,683 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.44 vs. limit=15.0 2023-11-19 20:56:57,465 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 120400 2023-11-19 20:56:57,818 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=802606.6666666666, ans=0.025 2023-11-19 20:57:02,866 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=802673.3333333334, ans=0.2 2023-11-19 20:57:06,060 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=802673.3333333334, ans=0.0 2023-11-19 20:57:12,101 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=802673.3333333334, ans=0.125 2023-11-19 20:57:16,537 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=802740.0, ans=0.125 2023-11-19 20:57:19,648 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=802740.0, ans=0.0 2023-11-19 20:57:31,656 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.51 vs. limit=15.0 2023-11-19 20:57:38,097 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.53 vs. limit=15.0 2023-11-19 20:57:40,930 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 200, loss[loss=0.0872, simple_loss=0.1011, pruned_loss=0.02593, audio_tagging_loss=0.01072, over 15971.00 frames. ], tot_loss[loss=0.08756, simple_loss=0.1013, pruned_loss=0.02132, audio_tagging_loss=0.01559, over 1939729.10 frames. ], batch size: 61, lr: 6.45e-03, grad_scale: 32.0 2023-11-19 20:57:54,015 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.513e+01 8.370e+01 8.919e+01 1.001e+02 1.772e+02, threshold=1.784e+02, percent-clipped=0.0 2023-11-19 20:57:59,331 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=802940.0, ans=0.125 2023-11-19 20:58:02,720 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 120450 2023-11-19 20:58:29,525 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=803073.3333333334, ans=0.125 2023-11-19 20:58:30,983 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 20:58:46,049 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 250, loss[loss=0.08463, simple_loss=0.1114, pruned_loss=0.02097, audio_tagging_loss=0.007943, over 15363.00 frames. ], tot_loss[loss=0.08676, simple_loss=0.1023, pruned_loss=0.02177, audio_tagging_loss=0.01385, over 2189767.22 frames. ], batch size: 57, lr: 6.45e-03, grad_scale: 16.0 2023-11-19 20:58:52,733 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=803206.6666666666, ans=0.125 2023-11-19 20:58:58,928 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=803273.3333333334, ans=0.1 2023-11-19 20:59:02,092 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=803273.3333333334, ans=0.125 2023-11-19 20:59:07,493 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=803273.3333333334, ans=22.5 2023-11-19 20:59:09,471 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 120500 2023-11-19 20:59:30,939 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=803406.6666666666, ans=0.0 2023-11-19 20:59:35,074 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=803406.6666666666, ans=0.125 2023-11-19 20:59:48,440 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.99 vs. limit=15.0 2023-11-19 20:59:52,072 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 300, loss[loss=0.07108, simple_loss=0.08551, pruned_loss=0.01573, audio_tagging_loss=0.0126, over 15546.00 frames. ], tot_loss[loss=0.08568, simple_loss=0.1024, pruned_loss=0.02167, audio_tagging_loss=0.01281, over 2379885.78 frames. ], batch size: 58, lr: 6.45e-03, grad_scale: 16.0 2023-11-19 20:59:53,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=803540.0, ans=0.125 2023-11-19 21:00:05,715 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.754e+01 8.387e+01 8.949e+01 9.814e+01 1.274e+02, threshold=1.790e+02, percent-clipped=0.0 2023-11-19 21:00:08,352 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=803606.6666666666, ans=0.0 2023-11-19 21:00:13,386 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 120550 2023-11-19 21:00:26,966 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.10 vs. limit=15.0 2023-11-19 21:00:52,425 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=803806.6666666666, ans=0.2 2023-11-19 21:00:54,240 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.26 vs. limit=22.5 2023-11-19 21:00:56,039 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 350, loss[loss=0.08146, simple_loss=0.1038, pruned_loss=0.02074, audio_tagging_loss=0.008819, over 14644.00 frames. ], tot_loss[loss=0.08467, simple_loss=0.1019, pruned_loss=0.02156, audio_tagging_loss=0.01214, over 2521708.77 frames. ], batch size: 54, lr: 6.44e-03, grad_scale: 16.0 2023-11-19 21:01:02,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=803873.3333333334, ans=0.125 2023-11-19 21:01:08,968 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=803940.0, ans=0.125 2023-11-19 21:01:17,864 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 120600 2023-11-19 21:01:38,592 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=804073.3333333334, ans=0.125 2023-11-19 21:01:46,104 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=804073.3333333334, ans=0.125 2023-11-19 21:01:51,306 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=804140.0, ans=0.125 2023-11-19 21:02:01,538 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 400, loss[loss=0.07578, simple_loss=0.0934, pruned_loss=0.02121, audio_tagging_loss=0.007872, over 16138.00 frames. ], tot_loss[loss=0.08493, simple_loss=0.1031, pruned_loss=0.0218, audio_tagging_loss=0.01156, over 2644334.23 frames. ], batch size: 60, lr: 6.44e-03, grad_scale: 32.0 2023-11-19 21:02:08,261 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.19 vs. limit=15.0 2023-11-19 21:02:15,707 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.819e+01 8.262e+01 8.810e+01 9.660e+01 1.540e+02, threshold=1.762e+02, percent-clipped=0.0 2023-11-19 21:02:24,497 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 120650 2023-11-19 21:02:28,453 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=804340.0, ans=0.125 2023-11-19 21:02:29,642 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=804340.0, ans=0.1 2023-11-19 21:02:34,667 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=804340.0, ans=0.0 2023-11-19 21:02:51,338 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 21:03:06,992 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 450, loss[loss=0.07402, simple_loss=0.1009, pruned_loss=0.01416, audio_tagging_loss=0.009431, over 15883.00 frames. ], tot_loss[loss=0.08443, simple_loss=0.1033, pruned_loss=0.02161, audio_tagging_loss=0.01118, over 2727553.18 frames. ], batch size: 58, lr: 6.44e-03, grad_scale: 32.0 2023-11-19 21:03:17,026 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.32 vs. limit=15.0 2023-11-19 21:03:28,975 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 120700 2023-11-19 21:03:34,603 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.98 vs. limit=15.0 2023-11-19 21:03:50,820 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=804740.0, ans=0.0 2023-11-19 21:03:53,797 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.20 vs. limit=22.5 2023-11-19 21:04:12,478 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 500, loss[loss=0.09404, simple_loss=0.1153, pruned_loss=0.02802, audio_tagging_loss=0.008348, over 14091.00 frames. ], tot_loss[loss=0.08457, simple_loss=0.1035, pruned_loss=0.02185, audio_tagging_loss=0.01096, over 2798903.48 frames. ], batch size: 53, lr: 6.44e-03, grad_scale: 32.0 2023-11-19 21:04:26,066 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.811e+01 8.498e+01 9.308e+01 1.026e+02 1.855e+02, threshold=1.862e+02, percent-clipped=1.0 2023-11-19 21:04:34,114 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 120750 2023-11-19 21:04:48,630 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=805006.6666666666, ans=0.0 2023-11-19 21:05:00,281 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=805073.3333333334, ans=0.025 2023-11-19 21:05:14,188 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=805140.0, ans=0.125 2023-11-19 21:05:15,244 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=805206.6666666666, ans=0.1 2023-11-19 21:05:16,332 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 550, loss[loss=0.09123, simple_loss=0.112, pruned_loss=0.0238, audio_tagging_loss=0.01145, over 16028.00 frames. ], tot_loss[loss=0.08508, simple_loss=0.1045, pruned_loss=0.02203, audio_tagging_loss=0.01078, over 2858048.83 frames. ], batch size: 61, lr: 6.44e-03, grad_scale: 32.0 2023-11-19 21:05:16,643 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=805206.6666666666, ans=0.0 2023-11-19 21:05:22,053 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=805206.6666666666, ans=0.125 2023-11-19 21:05:32,099 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=805273.3333333334, ans=0.1 2023-11-19 21:05:36,899 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=805273.3333333334, ans=0.125 2023-11-19 21:05:39,020 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 120800 2023-11-19 21:05:43,412 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=805340.0, ans=0.125 2023-11-19 21:05:55,441 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.38 vs. limit=10.0 2023-11-19 21:05:57,365 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=805406.6666666666, ans=0.1 2023-11-19 21:06:06,811 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.51 vs. limit=22.5 2023-11-19 21:06:19,274 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=805473.3333333334, ans=0.1 2023-11-19 21:06:21,508 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 600, loss[loss=0.08275, simple_loss=0.09008, pruned_loss=0.02319, audio_tagging_loss=0.01452, over 15245.00 frames. ], tot_loss[loss=0.08504, simple_loss=0.1043, pruned_loss=0.02224, audio_tagging_loss=0.01065, over 2903414.67 frames. ], batch size: 55, lr: 6.44e-03, grad_scale: 32.0 2023-11-19 21:06:30,787 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.05 vs. limit=15.0 2023-11-19 21:06:36,181 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=805606.6666666666, ans=0.125 2023-11-19 21:06:36,955 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.892e+01 8.276e+01 9.038e+01 9.770e+01 1.365e+02, threshold=1.808e+02, percent-clipped=0.0 2023-11-19 21:06:39,703 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=805606.6666666666, ans=0.0 2023-11-19 21:06:44,469 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 120850 2023-11-19 21:06:49,903 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.38 vs. limit=6.0 2023-11-19 21:06:57,447 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.56 vs. limit=22.5 2023-11-19 21:06:59,759 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=805740.0, ans=0.125 2023-11-19 21:07:09,817 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.49 vs. limit=22.5 2023-11-19 21:07:13,182 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.47 vs. limit=22.5 2023-11-19 21:07:27,247 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 650, loss[loss=0.04443, simple_loss=0.04705, pruned_loss=0.006757, audio_tagging_loss=0.01414, over 15067.00 frames. ], tot_loss[loss=0.08514, simple_loss=0.1042, pruned_loss=0.02229, audio_tagging_loss=0.01076, over 2939089.05 frames. ], batch size: 58, lr: 6.44e-03, grad_scale: 32.0 2023-11-19 21:07:27,447 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=805873.3333333334, ans=0.0 2023-11-19 21:07:27,482 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=805873.3333333334, ans=0.125 2023-11-19 21:07:48,707 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 120900 2023-11-19 21:07:50,229 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=805940.0, ans=0.125 2023-11-19 21:07:55,024 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=806006.6666666666, ans=0.1 2023-11-19 21:08:07,632 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.48 vs. limit=10.0 2023-11-19 21:08:29,975 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=806206.6666666666, ans=0.0 2023-11-19 21:08:30,871 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 700, loss[loss=0.07789, simple_loss=0.09904, pruned_loss=0.02079, audio_tagging_loss=0.007585, over 14487.00 frames. ], tot_loss[loss=0.08464, simple_loss=0.1037, pruned_loss=0.02213, audio_tagging_loss=0.01068, over 2966749.28 frames. ], batch size: 56, lr: 6.43e-03, grad_scale: 32.0 2023-11-19 21:08:34,784 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=806206.6666666666, ans=0.125 2023-11-19 21:08:44,816 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.115e+01 8.069e+01 8.585e+01 9.544e+01 1.162e+02, threshold=1.717e+02, percent-clipped=0.0 2023-11-19 21:08:47,750 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=806273.3333333334, ans=0.125 2023-11-19 21:08:53,542 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 120950 2023-11-19 21:09:24,503 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.50 vs. limit=15.0 2023-11-19 21:09:35,889 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 750, loss[loss=0.08733, simple_loss=0.1111, pruned_loss=0.02229, audio_tagging_loss=0.009484, over 14147.00 frames. ], tot_loss[loss=0.08418, simple_loss=0.1033, pruned_loss=0.02198, audio_tagging_loss=0.01053, over 2984174.53 frames. ], batch size: 52, lr: 6.43e-03, grad_scale: 32.0 2023-11-19 21:09:36,670 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.54 vs. limit=15.0 2023-11-19 21:09:58,762 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 121000 2023-11-19 21:10:00,491 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=806606.6666666666, ans=0.0 2023-11-19 21:10:04,058 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=806673.3333333334, ans=0.05 2023-11-19 21:10:20,777 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=806740.0, ans=0.1 2023-11-19 21:10:25,788 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=806740.0, ans=0.2 2023-11-19 21:10:37,972 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=806806.6666666666, ans=0.0 2023-11-19 21:10:41,366 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 800, loss[loss=0.07724, simple_loss=0.0931, pruned_loss=0.01832, audio_tagging_loss=0.01237, over 14567.00 frames. ], tot_loss[loss=0.08415, simple_loss=0.1031, pruned_loss=0.022, audio_tagging_loss=0.0106, over 3000747.88 frames. ], batch size: 58, lr: 6.43e-03, grad_scale: 32.0 2023-11-19 21:10:49,694 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=806873.3333333334, ans=0.125 2023-11-19 21:10:55,545 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.953e+01 8.303e+01 9.154e+01 9.871e+01 1.410e+02, threshold=1.831e+02, percent-clipped=0.0 2023-11-19 21:11:02,916 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 121050 2023-11-19 21:11:03,261 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=806940.0, ans=0.1 2023-11-19 21:11:08,734 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=807006.6666666666, ans=0.07 2023-11-19 21:11:34,244 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=807140.0, ans=0.0 2023-11-19 21:11:37,347 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.37 vs. limit=15.0 2023-11-19 21:11:45,775 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 850, loss[loss=0.06731, simple_loss=0.08384, pruned_loss=0.01494, audio_tagging_loss=0.01045, over 15960.00 frames. ], tot_loss[loss=0.08445, simple_loss=0.1033, pruned_loss=0.02219, audio_tagging_loss=0.01061, over 3009833.27 frames. ], batch size: 59, lr: 6.43e-03, grad_scale: 32.0 2023-11-19 21:11:49,840 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=807206.6666666666, ans=0.125 2023-11-19 21:11:57,622 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=807273.3333333334, ans=0.2 2023-11-19 21:12:00,297 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=807273.3333333334, ans=0.0 2023-11-19 21:12:07,862 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 121100 2023-11-19 21:12:44,629 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=807473.3333333334, ans=0.125 2023-11-19 21:12:50,519 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 900, loss[loss=0.0924, simple_loss=0.1175, pruned_loss=0.02253, audio_tagging_loss=0.01114, over 14734.00 frames. ], tot_loss[loss=0.08503, simple_loss=0.104, pruned_loss=0.02237, audio_tagging_loss=0.01065, over 3017545.81 frames. ], batch size: 55, lr: 6.43e-03, grad_scale: 32.0 2023-11-19 21:13:05,607 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.647e+01 8.167e+01 8.792e+01 9.769e+01 1.364e+02, threshold=1.758e+02, percent-clipped=0.0 2023-11-19 21:13:10,259 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.72 vs. limit=22.5 2023-11-19 21:13:13,265 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 121150 2023-11-19 21:13:18,596 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=807673.3333333334, ans=0.125 2023-11-19 21:13:19,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=807673.3333333334, ans=0.0 2023-11-19 21:13:21,028 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=807673.3333333334, ans=0.125 2023-11-19 21:13:56,695 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 950, loss[loss=0.0862, simple_loss=0.1167, pruned_loss=0.01875, audio_tagging_loss=0.009111, over 15340.00 frames. ], tot_loss[loss=0.08466, simple_loss=0.1039, pruned_loss=0.02223, audio_tagging_loss=0.01047, over 3019515.62 frames. ], batch size: 57, lr: 6.43e-03, grad_scale: 32.0 2023-11-19 21:14:03,644 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.34 vs. limit=15.0 2023-11-19 21:14:18,457 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 121200 2023-11-19 21:14:49,201 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=808140.0, ans=0.125 2023-11-19 21:15:01,041 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 1000, loss[loss=0.07426, simple_loss=0.08849, pruned_loss=0.02048, audio_tagging_loss=0.00954, over 14289.00 frames. ], tot_loss[loss=0.08391, simple_loss=0.1032, pruned_loss=0.02203, audio_tagging_loss=0.01028, over 3029265.67 frames. ], batch size: 55, lr: 6.43e-03, grad_scale: 32.0 2023-11-19 21:15:15,828 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.437e+01 8.149e+01 8.966e+01 9.862e+01 1.248e+02, threshold=1.793e+02, percent-clipped=0.0 2023-11-19 21:15:19,971 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=808273.3333333334, ans=0.09899494936611666 2023-11-19 21:15:23,332 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 121250 2023-11-19 21:15:29,313 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 21:15:38,063 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=808340.0, ans=0.2 2023-11-19 21:15:58,944 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.62 vs. limit=22.5 2023-11-19 21:16:02,473 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=808473.3333333334, ans=0.1 2023-11-19 21:16:05,934 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 1050, loss[loss=0.1008, simple_loss=0.1211, pruned_loss=0.03055, audio_tagging_loss=0.009703, over 14874.00 frames. ], tot_loss[loss=0.08329, simple_loss=0.1024, pruned_loss=0.02186, audio_tagging_loss=0.01024, over 3034102.87 frames. ], batch size: 58, lr: 6.43e-03, grad_scale: 32.0 2023-11-19 21:16:14,110 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=808540.0, ans=0.1 2023-11-19 21:16:17,266 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=808540.0, ans=0.0 2023-11-19 21:16:20,857 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=808606.6666666666, ans=0.125 2023-11-19 21:16:24,710 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=808606.6666666666, ans=0.0 2023-11-19 21:16:28,207 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 121300 2023-11-19 21:16:28,370 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=808606.6666666666, ans=0.0 2023-11-19 21:16:30,108 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.79 vs. limit=15.0 2023-11-19 21:16:54,250 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=808740.0, ans=0.125 2023-11-19 21:16:59,555 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.49 vs. limit=22.5 2023-11-19 21:17:07,960 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=808806.6666666666, ans=0.125 2023-11-19 21:17:09,214 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=808806.6666666666, ans=0.0 2023-11-19 21:17:11,259 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 1100, loss[loss=0.09883, simple_loss=0.1227, pruned_loss=0.0287, audio_tagging_loss=0.008772, over 15412.00 frames. ], tot_loss[loss=0.08342, simple_loss=0.1026, pruned_loss=0.02196, audio_tagging_loss=0.01014, over 3032664.97 frames. ], batch size: 58, lr: 6.42e-03, grad_scale: 16.0 2023-11-19 21:17:13,665 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 21:17:22,584 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=808940.0, ans=0.125 2023-11-19 21:17:25,897 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.793e+01 8.226e+01 9.036e+01 9.842e+01 1.440e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-19 21:17:32,121 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 121350 2023-11-19 21:17:46,420 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=809006.6666666666, ans=0.125 2023-11-19 21:17:47,618 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=809073.3333333334, ans=0.125 2023-11-19 21:17:54,141 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.50 vs. limit=10.0 2023-11-19 21:18:11,271 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.37 vs. limit=15.0 2023-11-19 21:18:12,508 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.72 vs. limit=15.0 2023-11-19 21:18:14,497 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 1150, loss[loss=0.04648, simple_loss=0.04622, pruned_loss=0.007254, audio_tagging_loss=0.01611, over 15796.00 frames. ], tot_loss[loss=0.08317, simple_loss=0.1023, pruned_loss=0.02186, audio_tagging_loss=0.01015, over 3040777.21 frames. ], batch size: 62, lr: 6.42e-03, grad_scale: 16.0 2023-11-19 21:18:37,141 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 121400 2023-11-19 21:19:01,386 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=809406.6666666666, ans=15.0 2023-11-19 21:19:20,002 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 1200, loss[loss=0.0745, simple_loss=0.09679, pruned_loss=0.01863, audio_tagging_loss=0.007465, over 14540.00 frames. ], tot_loss[loss=0.08277, simple_loss=0.1017, pruned_loss=0.02176, audio_tagging_loss=0.01017, over 3036478.38 frames. ], batch size: 56, lr: 6.42e-03, grad_scale: 32.0 2023-11-19 21:19:24,587 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=809540.0, ans=0.1 2023-11-19 21:19:27,202 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=809540.0, ans=0.125 2023-11-19 21:19:36,419 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.659e+01 8.249e+01 9.079e+01 9.946e+01 1.270e+02, threshold=1.816e+02, percent-clipped=0.0 2023-11-19 21:19:42,801 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 121450 2023-11-19 21:19:49,299 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=809673.3333333334, ans=0.2 2023-11-19 21:19:54,084 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=809673.3333333334, ans=0.05 2023-11-19 21:20:03,899 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 21:20:20,343 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=809806.6666666666, ans=0.5 2023-11-19 21:20:25,700 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 1250, loss[loss=0.08235, simple_loss=0.1055, pruned_loss=0.02262, audio_tagging_loss=0.006975, over 15064.00 frames. ], tot_loss[loss=0.08257, simple_loss=0.1013, pruned_loss=0.0217, audio_tagging_loss=0.01019, over 3035926.71 frames. ], batch size: 58, lr: 6.42e-03, grad_scale: 16.0 2023-11-19 21:20:44,181 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=809940.0, ans=0.125 2023-11-19 21:20:46,406 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 121500 2023-11-19 21:21:25,050 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=810140.0, ans=0.2 2023-11-19 21:21:28,548 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 1300, loss[loss=0.08464, simple_loss=0.1048, pruned_loss=0.02098, audio_tagging_loss=0.01124, over 15423.00 frames. ], tot_loss[loss=0.0835, simple_loss=0.1028, pruned_loss=0.0219, audio_tagging_loss=0.01018, over 3039999.33 frames. ], batch size: 60, lr: 6.42e-03, grad_scale: 16.0 2023-11-19 21:21:45,102 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.756e+01 8.695e+01 9.311e+01 1.032e+02 1.222e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-19 21:21:50,145 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 121550 2023-11-19 21:21:54,723 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=810340.0, ans=0.125 2023-11-19 21:21:55,793 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=810340.0, ans=0.125 2023-11-19 21:21:58,776 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=810340.0, ans=0.035 2023-11-19 21:22:01,463 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=810340.0, ans=0.2 2023-11-19 21:22:23,624 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=810473.3333333334, ans=0.1 2023-11-19 21:22:32,527 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 1350, loss[loss=0.0969, simple_loss=0.1203, pruned_loss=0.02487, audio_tagging_loss=0.01188, over 15344.00 frames. ], tot_loss[loss=0.0842, simple_loss=0.1038, pruned_loss=0.02223, audio_tagging_loss=0.01007, over 3037066.67 frames. ], batch size: 58, lr: 6.42e-03, grad_scale: 16.0 2023-11-19 21:22:35,302 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=810540.0, ans=0.125 2023-11-19 21:22:35,551 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=810540.0, ans=0.1 2023-11-19 21:22:54,983 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 121600 2023-11-19 21:22:56,745 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=810606.6666666666, ans=0.07 2023-11-19 21:23:05,365 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=810673.3333333334, ans=0.125 2023-11-19 21:23:07,864 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=810673.3333333334, ans=0.0 2023-11-19 21:23:18,718 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 21:23:25,674 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=810806.6666666666, ans=0.125 2023-11-19 21:23:26,913 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=810806.6666666666, ans=0.125 2023-11-19 21:23:34,889 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=810806.6666666666, ans=0.2 2023-11-19 21:23:37,675 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 1400, loss[loss=0.07522, simple_loss=0.0919, pruned_loss=0.01512, audio_tagging_loss=0.01416, over 15970.00 frames. ], tot_loss[loss=0.08409, simple_loss=0.1036, pruned_loss=0.02205, audio_tagging_loss=0.01024, over 3039717.90 frames. ], batch size: 62, lr: 6.42e-03, grad_scale: 16.0 2023-11-19 21:23:42,798 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=810873.3333333334, ans=0.125 2023-11-19 21:23:44,384 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.08 vs. limit=15.0 2023-11-19 21:23:53,611 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.902e+01 8.352e+01 8.972e+01 9.668e+01 1.251e+02, threshold=1.794e+02, percent-clipped=0.0 2023-11-19 21:23:58,592 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 121650 2023-11-19 21:23:58,719 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=810940.0, ans=0.0 2023-11-19 21:24:16,815 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.19 vs. limit=12.0 2023-11-19 21:24:20,341 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=811073.3333333334, ans=0.0 2023-11-19 21:24:40,433 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 1450, loss[loss=0.08452, simple_loss=0.1064, pruned_loss=0.01983, audio_tagging_loss=0.01149, over 15918.00 frames. ], tot_loss[loss=0.08418, simple_loss=0.1037, pruned_loss=0.02203, audio_tagging_loss=0.01029, over 3042775.04 frames. ], batch size: 58, lr: 6.42e-03, grad_scale: 16.0 2023-11-19 21:24:50,506 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=811206.6666666666, ans=0.125 2023-11-19 21:24:58,040 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=811273.3333333334, ans=0.125 2023-11-19 21:25:00,183 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.30 vs. limit=22.5 2023-11-19 21:25:01,965 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 121700 2023-11-19 21:25:28,251 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=811406.6666666666, ans=0.1 2023-11-19 21:25:44,114 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 1500, loss[loss=0.09139, simple_loss=0.1171, pruned_loss=0.02336, audio_tagging_loss=0.009473, over 15156.00 frames. ], tot_loss[loss=0.08499, simple_loss=0.1046, pruned_loss=0.02235, audio_tagging_loss=0.01034, over 3041880.15 frames. ], batch size: 58, lr: 6.41e-03, grad_scale: 16.0 2023-11-19 21:25:47,533 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=811540.0, ans=0.125 2023-11-19 21:26:01,436 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.708e+01 8.227e+01 9.077e+01 1.029e+02 1.490e+02, threshold=1.815e+02, percent-clipped=0.0 2023-11-19 21:26:03,312 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.54 vs. limit=15.0 2023-11-19 21:26:07,335 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 121750 2023-11-19 21:26:13,684 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=811673.3333333334, ans=0.1 2023-11-19 21:26:14,998 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=811673.3333333334, ans=0.05 2023-11-19 21:26:38,673 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=811806.6666666666, ans=0.0 2023-11-19 21:26:41,071 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=811806.6666666666, ans=0.0 2023-11-19 21:26:48,076 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 1550, loss[loss=0.05341, simple_loss=0.0518, pruned_loss=0.01445, audio_tagging_loss=0.01306, over 15834.00 frames. ], tot_loss[loss=0.08506, simple_loss=0.1042, pruned_loss=0.02239, audio_tagging_loss=0.01056, over 3043276.18 frames. ], batch size: 63, lr: 6.41e-03, grad_scale: 16.0 2023-11-19 21:27:08,164 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.46 vs. limit=12.0 2023-11-19 21:27:11,377 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 121800 2023-11-19 21:27:18,000 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=812006.6666666666, ans=0.1 2023-11-19 21:27:44,630 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.24 vs. limit=15.0 2023-11-19 21:27:46,135 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=812140.0, ans=0.5 2023-11-19 21:27:54,379 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 1600, loss[loss=0.07145, simple_loss=0.09914, pruned_loss=0.01211, audio_tagging_loss=0.009777, over 14592.00 frames. ], tot_loss[loss=0.08436, simple_loss=0.1034, pruned_loss=0.02203, audio_tagging_loss=0.01062, over 3041388.76 frames. ], batch size: 53, lr: 6.41e-03, grad_scale: 32.0 2023-11-19 21:27:55,836 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=812206.6666666666, ans=0.125 2023-11-19 21:28:10,247 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.741e+01 8.357e+01 8.989e+01 9.853e+01 1.199e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-19 21:28:13,012 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=812273.3333333334, ans=0.125 2023-11-19 21:28:15,893 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 121850 2023-11-19 21:28:20,018 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=812340.0, ans=0.125 2023-11-19 21:28:33,238 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.78 vs. limit=22.5 2023-11-19 21:28:37,317 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.42 vs. limit=15.0 2023-11-19 21:28:50,383 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=812473.3333333334, ans=0.1 2023-11-19 21:28:57,466 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 1650, loss[loss=0.09122, simple_loss=0.1154, pruned_loss=0.02258, audio_tagging_loss=0.01096, over 14701.00 frames. ], tot_loss[loss=0.08472, simple_loss=0.1039, pruned_loss=0.02213, audio_tagging_loss=0.01061, over 3043647.96 frames. ], batch size: 55, lr: 6.41e-03, grad_scale: 32.0 2023-11-19 21:28:59,450 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.69 vs. limit=15.0 2023-11-19 21:29:01,449 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=812540.0, ans=0.125 2023-11-19 21:29:10,033 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.30 vs. limit=6.0 2023-11-19 21:29:20,479 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 121900 2023-11-19 21:30:01,941 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 1700, loss[loss=0.1074, simple_loss=0.1348, pruned_loss=0.03115, audio_tagging_loss=0.008867, over 16131.00 frames. ], tot_loss[loss=0.08508, simple_loss=0.1044, pruned_loss=0.02225, audio_tagging_loss=0.01063, over 3048893.74 frames. ], batch size: 59, lr: 6.41e-03, grad_scale: 32.0 2023-11-19 21:30:03,438 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=812873.3333333334, ans=0.1 2023-11-19 21:30:11,001 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=812873.3333333334, ans=0.125 2023-11-19 21:30:19,200 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.98 vs. limit=15.0 2023-11-19 21:30:19,797 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.291e+01 8.451e+01 9.168e+01 1.014e+02 1.661e+02, threshold=1.834e+02, percent-clipped=0.0 2023-11-19 21:30:22,663 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=812940.0, ans=0.125 2023-11-19 21:30:24,730 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 121950 2023-11-19 21:30:50,211 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=813073.3333333334, ans=0.125 2023-11-19 21:31:07,576 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 1750, loss[loss=0.1048, simple_loss=0.1321, pruned_loss=0.03044, audio_tagging_loss=0.008325, over 15353.00 frames. ], tot_loss[loss=0.08509, simple_loss=0.1044, pruned_loss=0.02222, audio_tagging_loss=0.01067, over 3049695.40 frames. ], batch size: 56, lr: 6.41e-03, grad_scale: 32.0 2023-11-19 21:31:11,587 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 21:31:27,247 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=813273.3333333334, ans=0.125 2023-11-19 21:31:28,409 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 122000 2023-11-19 21:31:33,522 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=813340.0, ans=0.125 2023-11-19 21:31:57,882 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.50 vs. limit=22.5 2023-11-19 21:32:12,119 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 1800, loss[loss=0.07669, simple_loss=0.09613, pruned_loss=0.01791, audio_tagging_loss=0.01072, over 14997.00 frames. ], tot_loss[loss=0.08428, simple_loss=0.1035, pruned_loss=0.02192, audio_tagging_loss=0.0106, over 3049788.43 frames. ], batch size: 55, lr: 6.41e-03, grad_scale: 32.0 2023-11-19 21:32:21,677 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=813540.0, ans=0.0 2023-11-19 21:32:22,811 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=813540.0, ans=0.0 2023-11-19 21:32:28,675 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.860e+01 8.061e+01 8.839e+01 9.659e+01 3.662e+02, threshold=1.768e+02, percent-clipped=1.0 2023-11-19 21:32:30,539 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.48 vs. limit=15.0 2023-11-19 21:32:34,305 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 122050 2023-11-19 21:32:38,934 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=813673.3333333334, ans=0.125 2023-11-19 21:32:46,340 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=813673.3333333334, ans=0.125 2023-11-19 21:32:55,717 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=813740.0, ans=0.2 2023-11-19 21:33:13,852 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.77 vs. limit=15.0 2023-11-19 21:33:15,902 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 21:33:16,797 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 1850, loss[loss=0.07744, simple_loss=0.1019, pruned_loss=0.01658, audio_tagging_loss=0.009928, over 15333.00 frames. ], tot_loss[loss=0.08453, simple_loss=0.104, pruned_loss=0.02205, audio_tagging_loss=0.01049, over 3047850.46 frames. ], batch size: 56, lr: 6.40e-03, grad_scale: 16.0 2023-11-19 21:33:18,505 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.84 vs. limit=22.5 2023-11-19 21:33:23,685 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=813873.3333333334, ans=0.125 2023-11-19 21:33:24,230 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.37 vs. limit=22.5 2023-11-19 21:33:38,886 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 122100 2023-11-19 21:34:21,906 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 1900, loss[loss=0.09611, simple_loss=0.1142, pruned_loss=0.03, audio_tagging_loss=0.009019, over 15297.00 frames. ], tot_loss[loss=0.08388, simple_loss=0.1032, pruned_loss=0.02178, audio_tagging_loss=0.01049, over 3053266.43 frames. ], batch size: 56, lr: 6.40e-03, grad_scale: 16.0 2023-11-19 21:34:36,321 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=814273.3333333334, ans=0.1 2023-11-19 21:34:39,675 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.976e+01 8.118e+01 8.678e+01 9.738e+01 1.673e+02, threshold=1.736e+02, percent-clipped=0.0 2023-11-19 21:34:40,311 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.10 vs. limit=12.0 2023-11-19 21:34:43,634 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 122150 2023-11-19 21:35:01,053 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=814406.6666666666, ans=0.125 2023-11-19 21:35:02,575 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.82 vs. limit=6.0 2023-11-19 21:35:04,262 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=814406.6666666666, ans=0.125 2023-11-19 21:35:10,045 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 21:35:26,510 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 1950, loss[loss=0.08921, simple_loss=0.105, pruned_loss=0.0261, audio_tagging_loss=0.01063, over 15723.00 frames. ], tot_loss[loss=0.08345, simple_loss=0.1025, pruned_loss=0.0217, audio_tagging_loss=0.01048, over 3052051.93 frames. ], batch size: 60, lr: 6.40e-03, grad_scale: 16.0 2023-11-19 21:35:48,128 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 122200 2023-11-19 21:35:57,766 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.67 vs. limit=15.0 2023-11-19 21:36:00,049 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.28 vs. limit=15.0 2023-11-19 21:36:08,282 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=814740.0, ans=0.1 2023-11-19 21:36:18,544 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=814806.6666666666, ans=0.125 2023-11-19 21:36:28,867 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=814806.6666666666, ans=0.125 2023-11-19 21:36:31,167 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 2000, loss[loss=0.07786, simple_loss=0.09798, pruned_loss=0.02035, audio_tagging_loss=0.008518, over 15848.00 frames. ], tot_loss[loss=0.08341, simple_loss=0.1024, pruned_loss=0.02176, audio_tagging_loss=0.01045, over 3049249.91 frames. ], batch size: 60, lr: 6.40e-03, grad_scale: 32.0 2023-11-19 21:36:40,616 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=814873.3333333334, ans=0.125 2023-11-19 21:36:49,640 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.135e+01 8.499e+01 9.542e+01 1.090e+02 1.717e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-19 21:36:53,399 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 122250 2023-11-19 21:36:58,002 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.79 vs. limit=22.5 2023-11-19 21:37:04,511 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.67 vs. limit=12.0 2023-11-19 21:37:16,240 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=815073.3333333334, ans=0.0 2023-11-19 21:37:21,343 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=815073.3333333334, ans=0.1 2023-11-19 21:37:22,691 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=815140.0, ans=0.0 2023-11-19 21:37:36,784 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 2050, loss[loss=0.08515, simple_loss=0.1037, pruned_loss=0.02497, audio_tagging_loss=0.008319, over 14480.00 frames. ], tot_loss[loss=0.08369, simple_loss=0.1027, pruned_loss=0.02194, audio_tagging_loss=0.0104, over 3044860.47 frames. ], batch size: 54, lr: 6.40e-03, grad_scale: 32.0 2023-11-19 21:37:37,216 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=815206.6666666666, ans=0.1 2023-11-19 21:37:39,517 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=815206.6666666666, ans=0.125 2023-11-19 21:37:40,050 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.15 vs. limit=15.0 2023-11-19 21:37:41,079 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=815206.6666666666, ans=0.1 2023-11-19 21:37:58,478 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 122300 2023-11-19 21:37:58,991 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.76 vs. limit=22.5 2023-11-19 21:38:05,360 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=815340.0, ans=0.0 2023-11-19 21:38:19,908 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.whiten.whitening_limit, batch_count=815406.6666666666, ans=12.0 2023-11-19 21:38:24,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=815406.6666666666, ans=0.1 2023-11-19 21:38:31,667 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.23 vs. limit=22.5 2023-11-19 21:38:33,664 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=815473.3333333334, ans=0.125 2023-11-19 21:38:39,968 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=815540.0, ans=0.1 2023-11-19 21:38:40,858 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 2100, loss[loss=0.06391, simple_loss=0.06801, pruned_loss=0.0197, audio_tagging_loss=0.01021, over 17180.00 frames. ], tot_loss[loss=0.08389, simple_loss=0.1034, pruned_loss=0.02189, audio_tagging_loss=0.01031, over 3049036.88 frames. ], batch size: 66, lr: 6.40e-03, grad_scale: 32.0 2023-11-19 21:38:59,063 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.154e+01 8.161e+01 9.375e+01 1.016e+02 1.346e+02, threshold=1.875e+02, percent-clipped=0.0 2023-11-19 21:39:02,958 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 122350 2023-11-19 21:39:11,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=815673.3333333334, ans=0.125 2023-11-19 21:39:17,851 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=815673.3333333334, ans=0.07 2023-11-19 21:39:17,899 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=815673.3333333334, ans=0.1 2023-11-19 21:39:19,074 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=815740.0, ans=0.0 2023-11-19 21:39:19,124 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=815740.0, ans=0.125 2023-11-19 21:39:21,627 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=815740.0, ans=0.125 2023-11-19 21:39:37,025 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=815806.6666666666, ans=0.0 2023-11-19 21:39:45,688 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 2150, loss[loss=0.07957, simple_loss=0.09706, pruned_loss=0.02004, audio_tagging_loss=0.011, over 14600.00 frames. ], tot_loss[loss=0.0833, simple_loss=0.1027, pruned_loss=0.02165, audio_tagging_loss=0.01032, over 3042841.94 frames. ], batch size: 56, lr: 6.40e-03, grad_scale: 32.0 2023-11-19 21:39:50,785 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=815873.3333333334, ans=0.0 2023-11-19 21:39:50,798 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=815873.3333333334, ans=0.125 2023-11-19 21:39:54,616 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=815873.3333333334, ans=0.125 2023-11-19 21:39:54,687 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=815873.3333333334, ans=0.0 2023-11-19 21:40:08,138 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 122400 2023-11-19 21:40:24,488 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 21:40:51,639 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 2200, loss[loss=0.09373, simple_loss=0.1117, pruned_loss=0.02854, audio_tagging_loss=0.009341, over 15562.00 frames. ], tot_loss[loss=0.08304, simple_loss=0.1024, pruned_loss=0.02157, audio_tagging_loss=0.01028, over 3040328.74 frames. ], batch size: 59, lr: 6.40e-03, grad_scale: 32.0 2023-11-19 21:40:58,246 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=816206.6666666666, ans=0.1 2023-11-19 21:41:08,862 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.683e+01 8.220e+01 9.086e+01 1.022e+02 1.678e+02, threshold=1.817e+02, percent-clipped=0.0 2023-11-19 21:41:10,428 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=816273.3333333334, ans=0.09899494936611666 2023-11-19 21:41:12,646 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 122450 2023-11-19 21:41:18,251 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=816340.0, ans=0.0 2023-11-19 21:41:20,868 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=816340.0, ans=0.125 2023-11-19 21:41:20,955 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.33 vs. limit=15.0 2023-11-19 21:41:39,845 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=816406.6666666666, ans=0.125 2023-11-19 21:41:41,036 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=816406.6666666666, ans=0.07 2023-11-19 21:41:55,711 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 2250, loss[loss=0.06364, simple_loss=0.06945, pruned_loss=0.0154, audio_tagging_loss=0.01352, over 15864.00 frames. ], tot_loss[loss=0.08383, simple_loss=0.1032, pruned_loss=0.02192, audio_tagging_loss=0.01031, over 3037077.16 frames. ], batch size: 62, lr: 6.39e-03, grad_scale: 32.0 2023-11-19 21:42:09,065 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=816606.6666666666, ans=0.125 2023-11-19 21:42:17,978 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 122500 2023-11-19 21:42:56,684 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=816806.6666666666, ans=0.125 2023-11-19 21:43:00,650 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 2300, loss[loss=0.06771, simple_loss=0.07797, pruned_loss=0.01725, audio_tagging_loss=0.01148, over 15429.00 frames. ], tot_loss[loss=0.08433, simple_loss=0.1039, pruned_loss=0.02202, audio_tagging_loss=0.01035, over 3036612.09 frames. ], batch size: 58, lr: 6.39e-03, grad_scale: 32.0 2023-11-19 21:43:09,250 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=816873.3333333334, ans=0.1 2023-11-19 21:43:19,160 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=816940.0, ans=0.125 2023-11-19 21:43:19,977 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.000e+01 8.330e+01 8.975e+01 9.637e+01 1.370e+02, threshold=1.795e+02, percent-clipped=0.0 2023-11-19 21:43:23,805 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 122550 2023-11-19 21:43:26,540 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=817006.6666666666, ans=0.125 2023-11-19 21:43:58,434 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 21:44:00,388 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.36 vs. limit=22.5 2023-11-19 21:44:07,000 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 2350, loss[loss=0.09041, simple_loss=0.1037, pruned_loss=0.02429, audio_tagging_loss=0.01428, over 14594.00 frames. ], tot_loss[loss=0.08463, simple_loss=0.1043, pruned_loss=0.022, audio_tagging_loss=0.01047, over 3034627.94 frames. ], batch size: 54, lr: 6.39e-03, grad_scale: 16.0 2023-11-19 21:44:20,741 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=817273.3333333334, ans=0.2 2023-11-19 21:44:28,149 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 122600 2023-11-19 21:44:54,964 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=817406.6666666666, ans=0.1 2023-11-19 21:45:11,211 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 2400, loss[loss=0.1035, simple_loss=0.1206, pruned_loss=0.03293, audio_tagging_loss=0.01023, over 14842.00 frames. ], tot_loss[loss=0.08432, simple_loss=0.1039, pruned_loss=0.0219, audio_tagging_loss=0.01046, over 3032999.49 frames. ], batch size: 58, lr: 6.39e-03, grad_scale: 32.0 2023-11-19 21:45:11,585 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=817540.0, ans=0.125 2023-11-19 21:45:30,248 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.655e+01 8.060e+01 8.978e+01 9.886e+01 1.686e+02, threshold=1.796e+02, percent-clipped=0.0 2023-11-19 21:45:32,849 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 122650 2023-11-19 21:45:44,078 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=817673.3333333334, ans=0.1 2023-11-19 21:45:59,460 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=817740.0, ans=0.035 2023-11-19 21:46:13,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=817806.6666666666, ans=0.1 2023-11-19 21:46:13,466 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.54 vs. limit=22.5 2023-11-19 21:46:15,438 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 2450, loss[loss=0.08087, simple_loss=0.0954, pruned_loss=0.02102, audio_tagging_loss=0.01216, over 16256.00 frames. ], tot_loss[loss=0.08468, simple_loss=0.1045, pruned_loss=0.02194, audio_tagging_loss=0.01049, over 3034622.37 frames. ], batch size: 63, lr: 6.39e-03, grad_scale: 32.0 2023-11-19 21:46:16,944 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=817873.3333333334, ans=0.0 2023-11-19 21:46:36,543 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.72 vs. limit=15.0 2023-11-19 21:46:37,619 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=817940.0, ans=0.1 2023-11-19 21:46:38,574 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 122700 2023-11-19 21:46:57,212 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=818073.3333333334, ans=0.0 2023-11-19 21:47:03,913 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.11 vs. limit=22.5 2023-11-19 21:47:21,098 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 2500, loss[loss=0.08245, simple_loss=0.09739, pruned_loss=0.02403, audio_tagging_loss=0.009725, over 14600.00 frames. ], tot_loss[loss=0.08448, simple_loss=0.1039, pruned_loss=0.02195, audio_tagging_loss=0.01056, over 3037481.99 frames. ], batch size: 54, lr: 6.39e-03, grad_scale: 32.0 2023-11-19 21:47:40,053 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.459e+01 8.244e+01 8.954e+01 9.755e+01 1.221e+02, threshold=1.791e+02, percent-clipped=0.0 2023-11-19 21:47:42,592 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 122750 2023-11-19 21:48:11,720 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=818473.3333333334, ans=0.125 2023-11-19 21:48:22,277 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=818473.3333333334, ans=0.0 2023-11-19 21:48:25,692 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 2550, loss[loss=0.09141, simple_loss=0.1091, pruned_loss=0.02649, audio_tagging_loss=0.01035, over 14834.00 frames. ], tot_loss[loss=0.08426, simple_loss=0.1037, pruned_loss=0.02199, audio_tagging_loss=0.01041, over 3034644.46 frames. ], batch size: 54, lr: 6.39e-03, grad_scale: 32.0 2023-11-19 21:48:47,304 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 122800 2023-11-19 21:49:08,981 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.27 vs. limit=22.5 2023-11-19 21:49:30,341 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 2600, loss[loss=0.08501, simple_loss=0.1053, pruned_loss=0.02483, audio_tagging_loss=0.007514, over 15759.00 frames. ], tot_loss[loss=0.08343, simple_loss=0.1027, pruned_loss=0.02183, audio_tagging_loss=0.01027, over 3042106.52 frames. ], batch size: 58, lr: 6.39e-03, grad_scale: 16.0 2023-11-19 21:49:35,044 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 21:49:52,168 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.938e+01 8.517e+01 9.235e+01 1.002e+02 1.405e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-19 21:49:53,576 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 122850 2023-11-19 21:49:56,393 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=819006.6666666666, ans=0.1 2023-11-19 21:50:14,394 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.57 vs. limit=15.0 2023-11-19 21:50:21,103 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=819140.0, ans=0.1 2023-11-19 21:50:26,912 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=819140.0, ans=0.1 2023-11-19 21:50:28,425 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.09 vs. limit=15.0 2023-11-19 21:50:35,310 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 2650, loss[loss=0.06491, simple_loss=0.08449, pruned_loss=0.01505, audio_tagging_loss=0.007615, over 15079.00 frames. ], tot_loss[loss=0.0831, simple_loss=0.1023, pruned_loss=0.02177, audio_tagging_loss=0.01017, over 3042044.89 frames. ], batch size: 57, lr: 6.38e-03, grad_scale: 16.0 2023-11-19 21:50:53,331 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=819273.3333333334, ans=0.0 2023-11-19 21:50:58,166 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 122900 2023-11-19 21:51:04,682 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=819340.0, ans=0.0 2023-11-19 21:51:19,245 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.67 vs. limit=15.0 2023-11-19 21:51:25,645 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.35 vs. limit=15.0 2023-11-19 21:51:35,442 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=819473.3333333334, ans=0.125 2023-11-19 21:51:41,503 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 2700, loss[loss=0.0893, simple_loss=0.1098, pruned_loss=0.02379, audio_tagging_loss=0.01061, over 14361.00 frames. ], tot_loss[loss=0.08304, simple_loss=0.1023, pruned_loss=0.0218, audio_tagging_loss=0.01011, over 3047818.42 frames. ], batch size: 55, lr: 6.38e-03, grad_scale: 16.0 2023-11-19 21:51:43,022 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=819540.0, ans=0.125 2023-11-19 21:52:01,446 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.391e+01 8.510e+01 9.186e+01 1.041e+02 2.301e+02, threshold=1.837e+02, percent-clipped=1.0 2023-11-19 21:52:03,518 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 122950 2023-11-19 21:52:19,416 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=819740.0, ans=0.0 2023-11-19 21:52:40,854 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.43 vs. limit=12.0 2023-11-19 21:52:46,276 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 2750, loss[loss=0.06955, simple_loss=0.0858, pruned_loss=0.01717, audio_tagging_loss=0.009479, over 14798.00 frames. ], tot_loss[loss=0.08254, simple_loss=0.1014, pruned_loss=0.02164, audio_tagging_loss=0.01021, over 3045275.35 frames. ], batch size: 55, lr: 6.38e-03, grad_scale: 16.0 2023-11-19 21:52:46,959 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.00 vs. limit=15.0 2023-11-19 21:53:01,893 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=819940.0, ans=0.125 2023-11-19 21:53:08,473 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 123000 2023-11-19 21:53:21,537 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.03 vs. limit=22.5 2023-11-19 21:53:27,514 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=820073.3333333334, ans=0.125 2023-11-19 21:53:33,738 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=820073.3333333334, ans=0.1 2023-11-19 21:53:34,840 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=820073.3333333334, ans=0.1 2023-11-19 21:53:36,153 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=820073.3333333334, ans=0.0 2023-11-19 21:53:39,855 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 21:53:40,873 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 21:53:47,928 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 21:53:51,407 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 2800, loss[loss=0.071, simple_loss=0.09045, pruned_loss=0.01777, audio_tagging_loss=0.007999, over 15786.00 frames. ], tot_loss[loss=0.08261, simple_loss=0.1014, pruned_loss=0.0217, audio_tagging_loss=0.01021, over 3050634.19 frames. ], batch size: 60, lr: 6.38e-03, grad_scale: 16.0 2023-11-19 21:54:04,099 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=820273.3333333334, ans=0.2 2023-11-19 21:54:05,165 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=820273.3333333334, ans=0.125 2023-11-19 21:54:07,120 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=820273.3333333334, ans=0.05 2023-11-19 21:54:14,040 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.993e+01 8.139e+01 8.815e+01 9.780e+01 1.679e+02, threshold=1.763e+02, percent-clipped=0.0 2023-11-19 21:54:14,194 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 123050 2023-11-19 21:54:14,768 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.98 vs. limit=6.0 2023-11-19 21:54:33,160 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.59 vs. limit=15.0 2023-11-19 21:54:42,407 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=820473.3333333334, ans=0.0 2023-11-19 21:54:49,086 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=820473.3333333334, ans=0.1 2023-11-19 21:54:53,812 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.64 vs. limit=15.0 2023-11-19 21:54:56,814 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 2850, loss[loss=0.08897, simple_loss=0.1109, pruned_loss=0.02337, audio_tagging_loss=0.01017, over 15410.00 frames. ], tot_loss[loss=0.08279, simple_loss=0.1016, pruned_loss=0.02178, audio_tagging_loss=0.01022, over 3047373.35 frames. ], batch size: 56, lr: 6.38e-03, grad_scale: 16.0 2023-11-19 21:54:57,014 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=820540.0, ans=0.125 2023-11-19 21:55:02,559 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=820540.0, ans=0.1 2023-11-19 21:55:02,574 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=820540.0, ans=0.0 2023-11-19 21:55:10,039 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=820606.6666666666, ans=0.125 2023-11-19 21:55:18,661 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 123100 2023-11-19 21:55:44,296 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=820740.0, ans=0.0 2023-11-19 21:55:51,795 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.14 vs. limit=15.0 2023-11-19 21:56:02,100 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 2900, loss[loss=0.06663, simple_loss=0.08617, pruned_loss=0.01579, audio_tagging_loss=0.007745, over 15120.00 frames. ], tot_loss[loss=0.08207, simple_loss=0.1007, pruned_loss=0.02147, audio_tagging_loss=0.01023, over 3044680.89 frames. ], batch size: 59, lr: 6.38e-03, grad_scale: 16.0 2023-11-19 21:56:05,043 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=820873.3333333334, ans=0.0 2023-11-19 21:56:14,323 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=820940.0, ans=0.05 2023-11-19 21:56:23,814 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.958e+01 8.184e+01 8.854e+01 9.488e+01 1.292e+02, threshold=1.771e+02, percent-clipped=0.0 2023-11-19 21:56:23,969 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 123150 2023-11-19 21:56:28,450 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=821006.6666666666, ans=0.1 2023-11-19 21:57:06,583 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 2950, loss[loss=0.09218, simple_loss=0.113, pruned_loss=0.02231, audio_tagging_loss=0.01338, over 15836.00 frames. ], tot_loss[loss=0.08281, simple_loss=0.1017, pruned_loss=0.02175, audio_tagging_loss=0.0102, over 3044636.57 frames. ], batch size: 59, lr: 6.38e-03, grad_scale: 16.0 2023-11-19 21:57:06,985 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=821206.6666666666, ans=0.0 2023-11-19 21:57:12,968 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=821206.6666666666, ans=0.125 2023-11-19 21:57:25,329 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=821273.3333333334, ans=0.125 2023-11-19 21:57:28,834 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 123200 2023-11-19 21:57:33,605 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=821340.0, ans=0.2 2023-11-19 21:57:42,415 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=821340.0, ans=0.0 2023-11-19 21:57:47,367 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=7.120e-01 2023-11-19 21:58:00,120 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=821473.3333333334, ans=0.2 2023-11-19 21:58:04,494 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=821473.3333333334, ans=0.125 2023-11-19 21:58:05,769 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=821473.3333333334, ans=0.125 2023-11-19 21:58:06,813 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=821473.3333333334, ans=0.125 2023-11-19 21:58:12,094 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 3000, loss[loss=0.0613, simple_loss=0.07175, pruned_loss=0.01095, audio_tagging_loss=0.01448, over 14927.00 frames. ], tot_loss[loss=0.08396, simple_loss=0.1033, pruned_loss=0.02202, audio_tagging_loss=0.01028, over 3047838.92 frames. ], batch size: 56, lr: 6.37e-03, grad_scale: 16.0 2023-11-19 21:58:12,098 INFO [train_asr.py:1285] (0/4) Computing validation loss 2023-11-19 21:58:30,926 INFO [zipformer.py:1873] (0/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([4.4960, 3.7834, 3.5432, 4.2746, 3.6238, 3.6621, 3.9590, 3.3986], device='cuda:0') 2023-11-19 21:58:52,220 INFO [train_asr.py:1294] (0/4) Epoch 11, validation: loss=0.06441, simple_loss=0.05497, pruned_loss=0.006219, audio_tagging_loss=0.03071, over 4681554.00 frames. 2023-11-19 21:58:52,221 INFO [train_asr.py:1295] (0/4) Maximum memory allocated so far is 25925MB 2023-11-19 21:58:59,212 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=821540.0, ans=0.2 2023-11-19 21:59:07,658 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 21:59:14,428 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.751e+01 8.445e+01 9.049e+01 1.018e+02 1.456e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-19 21:59:14,562 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 123250 2023-11-19 21:59:15,122 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.37 vs. limit=22.5 2023-11-19 21:59:19,533 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=821673.3333333334, ans=0.2 2023-11-19 21:59:43,806 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=821806.6666666666, ans=0.2 2023-11-19 21:59:54,823 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=821873.3333333334, ans=0.1 2023-11-19 21:59:55,778 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 3050, loss[loss=0.08536, simple_loss=0.1029, pruned_loss=0.0185, audio_tagging_loss=0.01542, over 15592.00 frames. ], tot_loss[loss=0.08363, simple_loss=0.1027, pruned_loss=0.02191, audio_tagging_loss=0.01036, over 3053423.16 frames. ], batch size: 59, lr: 6.37e-03, grad_scale: 16.0 2023-11-19 22:00:14,497 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=821940.0, ans=0.0 2023-11-19 22:00:18,121 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 123300 2023-11-19 22:00:18,318 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=821940.0, ans=0.125 2023-11-19 22:00:33,962 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 22:00:40,210 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=822073.3333333334, ans=0.2 2023-11-19 22:00:40,313 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=822073.3333333334, ans=0.0 2023-11-19 22:00:44,265 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=822073.3333333334, ans=10.0 2023-11-19 22:00:49,808 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.18 vs. limit=10.0 2023-11-19 22:01:01,041 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 3100, loss[loss=0.05782, simple_loss=0.06494, pruned_loss=0.01065, audio_tagging_loss=0.0147, over 15380.00 frames. ], tot_loss[loss=0.08295, simple_loss=0.1018, pruned_loss=0.02155, audio_tagging_loss=0.01052, over 3050520.40 frames. ], batch size: 58, lr: 6.37e-03, grad_scale: 16.0 2023-11-19 22:01:09,675 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=822206.6666666666, ans=0.0 2023-11-19 22:01:11,088 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=822206.6666666666, ans=0.0 2023-11-19 22:01:11,651 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.75 vs. limit=22.5 2023-11-19 22:01:22,955 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.275e+01 8.148e+01 8.871e+01 9.460e+01 1.235e+02, threshold=1.774e+02, percent-clipped=0.0 2023-11-19 22:01:23,097 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 123350 2023-11-19 22:01:24,459 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 22:01:25,762 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=822340.0, ans=0.0 2023-11-19 22:01:41,172 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=822406.6666666666, ans=0.1 2023-11-19 22:01:41,433 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=822406.6666666666, ans=15.0 2023-11-19 22:01:45,369 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=822406.6666666666, ans=0.0 2023-11-19 22:01:48,396 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.46 vs. limit=10.0 2023-11-19 22:02:04,735 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=822540.0, ans=0.0 2023-11-19 22:02:05,585 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 3150, loss[loss=0.08195, simple_loss=0.09562, pruned_loss=0.02244, audio_tagging_loss=0.01169, over 14552.00 frames. ], tot_loss[loss=0.08428, simple_loss=0.1039, pruned_loss=0.02191, audio_tagging_loss=0.01043, over 3058561.12 frames. ], batch size: 57, lr: 6.37e-03, grad_scale: 16.0 2023-11-19 22:02:05,927 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=822540.0, ans=0.1 2023-11-19 22:02:15,015 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=822540.0, ans=0.125 2023-11-19 22:02:18,750 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=822606.6666666666, ans=0.0 2023-11-19 22:02:27,751 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 123400 2023-11-19 22:02:47,178 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=822740.0, ans=0.125 2023-11-19 22:03:06,791 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=822806.6666666666, ans=0.125 2023-11-19 22:03:10,320 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 3200, loss[loss=0.1101, simple_loss=0.1379, pruned_loss=0.02962, audio_tagging_loss=0.01147, over 15300.00 frames. ], tot_loss[loss=0.08474, simple_loss=0.1047, pruned_loss=0.02193, audio_tagging_loss=0.01043, over 3053019.06 frames. ], batch size: 56, lr: 6.37e-03, grad_scale: 32.0 2023-11-19 22:03:10,644 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=822873.3333333334, ans=0.0 2023-11-19 22:03:21,805 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=822873.3333333334, ans=0.1 2023-11-19 22:03:24,053 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=822940.0, ans=0.95 2023-11-19 22:03:32,360 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.734e+01 8.258e+01 8.832e+01 9.801e+01 1.591e+02, threshold=1.766e+02, percent-clipped=0.0 2023-11-19 22:03:32,513 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 123450 2023-11-19 22:03:41,646 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=823006.6666666666, ans=0.1 2023-11-19 22:03:43,956 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=823006.6666666666, ans=0.125 2023-11-19 22:03:57,579 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=823073.3333333334, ans=0.125 2023-11-19 22:04:15,936 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 3250, loss[loss=0.08522, simple_loss=0.1096, pruned_loss=0.02277, audio_tagging_loss=0.007652, over 15744.00 frames. ], tot_loss[loss=0.08505, simple_loss=0.1052, pruned_loss=0.02197, audio_tagging_loss=0.01047, over 3053372.81 frames. ], batch size: 58, lr: 6.37e-03, grad_scale: 32.0 2023-11-19 22:04:19,892 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=823206.6666666666, ans=0.125 2023-11-19 22:04:26,099 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=823206.6666666666, ans=0.025 2023-11-19 22:04:33,389 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=823273.3333333334, ans=0.0 2023-11-19 22:04:33,552 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=823273.3333333334, ans=0.125 2023-11-19 22:04:36,959 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 123500 2023-11-19 22:04:43,003 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.77 vs. limit=15.0 2023-11-19 22:05:04,414 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=823406.6666666666, ans=0.125 2023-11-19 22:05:08,079 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=823473.3333333334, ans=0.125 2023-11-19 22:05:16,709 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=823473.3333333334, ans=0.125 2023-11-19 22:05:18,924 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 3300, loss[loss=0.1007, simple_loss=0.126, pruned_loss=0.02761, audio_tagging_loss=0.0101, over 14598.00 frames. ], tot_loss[loss=0.08511, simple_loss=0.1052, pruned_loss=0.02206, audio_tagging_loss=0.01047, over 3054970.08 frames. ], batch size: 55, lr: 6.37e-03, grad_scale: 32.0 2023-11-19 22:05:34,015 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=823606.6666666666, ans=0.125 2023-11-19 22:05:40,806 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.210e+01 8.419e+01 8.972e+01 9.610e+01 1.838e+02, threshold=1.794e+02, percent-clipped=1.0 2023-11-19 22:05:40,962 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 123550 2023-11-19 22:05:45,632 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=823673.3333333334, ans=0.07 2023-11-19 22:06:03,240 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=823740.0, ans=0.0 2023-11-19 22:06:07,064 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=823740.0, ans=0.1 2023-11-19 22:06:23,927 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 3350, loss[loss=0.08246, simple_loss=0.09907, pruned_loss=0.02451, audio_tagging_loss=0.008418, over 15097.00 frames. ], tot_loss[loss=0.0843, simple_loss=0.1041, pruned_loss=0.02181, audio_tagging_loss=0.01043, over 3052777.22 frames. ], batch size: 59, lr: 6.37e-03, grad_scale: 32.0 2023-11-19 22:06:46,376 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 123600 2023-11-19 22:06:46,875 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.82 vs. limit=12.0 2023-11-19 22:06:58,317 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=824006.6666666666, ans=0.125 2023-11-19 22:06:58,523 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.51 vs. limit=15.0 2023-11-19 22:07:16,813 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=824140.0, ans=0.2 2023-11-19 22:07:20,388 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=824140.0, ans=0.2 2023-11-19 22:07:29,856 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 3400, loss[loss=0.06988, simple_loss=0.08378, pruned_loss=0.01401, audio_tagging_loss=0.01398, over 16230.00 frames. ], tot_loss[loss=0.08463, simple_loss=0.1049, pruned_loss=0.02188, audio_tagging_loss=0.0103, over 3046932.25 frames. ], batch size: 58, lr: 6.36e-03, grad_scale: 32.0 2023-11-19 22:07:50,141 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=824273.3333333334, ans=0.04949747468305833 2023-11-19 22:07:50,896 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.806e+01 8.550e+01 9.235e+01 1.051e+02 1.197e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-19 22:07:51,063 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 123650 2023-11-19 22:07:51,272 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=824273.3333333334, ans=0.1 2023-11-19 22:08:06,665 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.03 vs. limit=6.0 2023-11-19 22:08:10,281 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=824406.6666666666, ans=0.0 2023-11-19 22:08:10,324 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=824406.6666666666, ans=0.125 2023-11-19 22:08:14,045 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=824406.6666666666, ans=0.0 2023-11-19 22:08:33,944 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 3450, loss[loss=0.07282, simple_loss=0.09331, pruned_loss=0.01704, audio_tagging_loss=0.009128, over 16252.00 frames. ], tot_loss[loss=0.08471, simple_loss=0.1052, pruned_loss=0.02192, audio_tagging_loss=0.0102, over 3051578.29 frames. ], batch size: 61, lr: 6.36e-03, grad_scale: 32.0 2023-11-19 22:08:40,674 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.67 vs. limit=15.0 2023-11-19 22:08:41,510 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=824540.0, ans=0.2 2023-11-19 22:08:41,637 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=824540.0, ans=0.125 2023-11-19 22:08:45,949 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=824606.6666666666, ans=0.0 2023-11-19 22:08:49,789 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=824606.6666666666, ans=0.125 2023-11-19 22:08:56,270 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 123700 2023-11-19 22:09:09,975 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=824673.3333333334, ans=0.1 2023-11-19 22:09:09,988 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=824673.3333333334, ans=0.09899494936611666 2023-11-19 22:09:22,367 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=824740.0, ans=0.5 2023-11-19 22:09:38,696 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 3500, loss[loss=0.1042, simple_loss=0.1461, pruned_loss=0.02393, audio_tagging_loss=0.00726, over 15321.00 frames. ], tot_loss[loss=0.08404, simple_loss=0.1043, pruned_loss=0.0217, audio_tagging_loss=0.0102, over 3055504.23 frames. ], batch size: 54, lr: 6.36e-03, grad_scale: 32.0 2023-11-19 22:09:56,639 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=824940.0, ans=0.0 2023-11-19 22:09:56,779 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=824940.0, ans=0.125 2023-11-19 22:10:01,203 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.383e+01 8.617e+01 9.360e+01 1.039e+02 1.365e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-19 22:10:01,350 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 123750 2023-11-19 22:10:05,092 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=825006.6666666666, ans=0.0 2023-11-19 22:10:06,334 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=825006.6666666666, ans=0.0 2023-11-19 22:10:12,146 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 22:10:14,824 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=825006.6666666666, ans=0.125 2023-11-19 22:10:26,579 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=825073.3333333334, ans=0.0 2023-11-19 22:10:30,232 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=825140.0, ans=0.125 2023-11-19 22:10:35,720 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=825140.0, ans=0.0 2023-11-19 22:10:43,985 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 3550, loss[loss=0.09514, simple_loss=0.105, pruned_loss=0.02812, audio_tagging_loss=0.01453, over 14582.00 frames. ], tot_loss[loss=0.08313, simple_loss=0.103, pruned_loss=0.02141, audio_tagging_loss=0.01023, over 3053687.96 frames. ], batch size: 57, lr: 6.36e-03, grad_scale: 32.0 2023-11-19 22:10:54,208 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=825206.6666666666, ans=0.125 2023-11-19 22:10:55,306 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=825273.3333333334, ans=0.125 2023-11-19 22:11:01,985 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.50 vs. limit=15.0 2023-11-19 22:11:04,770 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 123800 2023-11-19 22:11:17,139 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.35 vs. limit=22.5 2023-11-19 22:11:19,126 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=825340.0, ans=0.125 2023-11-19 22:11:20,382 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=825406.6666666666, ans=0.125 2023-11-19 22:11:23,465 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=825406.6666666666, ans=0.1 2023-11-19 22:11:31,172 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 22:11:46,461 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=825540.0, ans=0.0 2023-11-19 22:11:47,390 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 3600, loss[loss=0.08032, simple_loss=0.09747, pruned_loss=0.01939, audio_tagging_loss=0.01219, over 14277.00 frames. ], tot_loss[loss=0.08392, simple_loss=0.1038, pruned_loss=0.02176, audio_tagging_loss=0.01025, over 3058840.02 frames. ], batch size: 56, lr: 6.36e-03, grad_scale: 32.0 2023-11-19 22:11:58,045 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.07 vs. limit=15.0 2023-11-19 22:12:06,264 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.09 vs. limit=6.0 2023-11-19 22:12:08,995 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.358e+01 8.243e+01 9.327e+01 1.037e+02 1.432e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-19 22:12:09,141 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 123850 2023-11-19 22:12:21,623 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=825673.3333333334, ans=0.0 2023-11-19 22:12:34,661 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.22 vs. limit=15.0 2023-11-19 22:12:52,097 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 3650, loss[loss=0.08944, simple_loss=0.1086, pruned_loss=0.02282, audio_tagging_loss=0.01232, over 15388.00 frames. ], tot_loss[loss=0.08469, simple_loss=0.1046, pruned_loss=0.02217, audio_tagging_loss=0.0102, over 3050284.50 frames. ], batch size: 57, lr: 6.36e-03, grad_scale: 32.0 2023-11-19 22:13:14,720 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.01 vs. limit=22.5 2023-11-19 22:13:15,361 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 123900 2023-11-19 22:13:19,221 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=826006.6666666666, ans=0.1 2023-11-19 22:13:45,637 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.24 vs. limit=15.0 2023-11-19 22:13:57,503 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 3700, loss[loss=0.09469, simple_loss=0.1159, pruned_loss=0.02803, audio_tagging_loss=0.008713, over 15746.00 frames. ], tot_loss[loss=0.08447, simple_loss=0.1045, pruned_loss=0.02206, audio_tagging_loss=0.01015, over 3050415.31 frames. ], batch size: 57, lr: 6.36e-03, grad_scale: 32.0 2023-11-19 22:14:13,035 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=826273.3333333334, ans=0.0 2023-11-19 22:14:15,479 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=826273.3333333334, ans=0.2 2023-11-19 22:14:17,907 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=826273.3333333334, ans=0.125 2023-11-19 22:14:18,775 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.848e+01 8.345e+01 9.061e+01 9.917e+01 1.388e+02, threshold=1.812e+02, percent-clipped=0.0 2023-11-19 22:14:18,936 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 123950 2023-11-19 22:14:46,752 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=826406.6666666666, ans=0.025 2023-11-19 22:15:01,856 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 3750, loss[loss=0.07627, simple_loss=0.09646, pruned_loss=0.01692, audio_tagging_loss=0.01112, over 14802.00 frames. ], tot_loss[loss=0.08407, simple_loss=0.104, pruned_loss=0.02186, audio_tagging_loss=0.0102, over 3051032.03 frames. ], batch size: 55, lr: 6.36e-03, grad_scale: 32.0 2023-11-19 22:15:13,557 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.99 vs. limit=22.5 2023-11-19 22:15:23,438 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 124000 2023-11-19 22:15:24,912 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-124000.pt 2023-11-19 22:15:50,610 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 22:16:00,664 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=826806.6666666666, ans=0.0 2023-11-19 22:16:06,321 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.33 vs. limit=22.5 2023-11-19 22:16:09,647 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 3800, loss[loss=0.0902, simple_loss=0.1089, pruned_loss=0.02623, audio_tagging_loss=0.009542, over 15455.00 frames. ], tot_loss[loss=0.08377, simple_loss=0.1035, pruned_loss=0.02178, audio_tagging_loss=0.01027, over 3058658.98 frames. ], batch size: 59, lr: 6.35e-03, grad_scale: 32.0 2023-11-19 22:16:32,431 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.048e+01 8.419e+01 9.148e+01 9.794e+01 1.250e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-19 22:16:32,601 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 124050 2023-11-19 22:16:53,167 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=827073.3333333334, ans=0.1 2023-11-19 22:17:07,906 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.27 vs. limit=15.0 2023-11-19 22:17:13,893 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 3850, loss[loss=0.08985, simple_loss=0.1124, pruned_loss=0.02389, audio_tagging_loss=0.00978, over 15972.00 frames. ], tot_loss[loss=0.08449, simple_loss=0.1044, pruned_loss=0.02198, audio_tagging_loss=0.0103, over 3065369.19 frames. ], batch size: 57, lr: 6.35e-03, grad_scale: 32.0 2023-11-19 22:17:15,347 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=827206.6666666666, ans=0.2 2023-11-19 22:17:25,914 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=827273.3333333334, ans=0.0 2023-11-19 22:17:35,532 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 124100 2023-11-19 22:17:42,043 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=827340.0, ans=0.125 2023-11-19 22:18:01,404 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=827406.6666666666, ans=0.0 2023-11-19 22:18:03,861 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=827473.3333333334, ans=0.2 2023-11-19 22:18:16,603 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=827473.3333333334, ans=15.0 2023-11-19 22:18:18,123 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 3900, loss[loss=0.09371, simple_loss=0.1243, pruned_loss=0.02085, audio_tagging_loss=0.01072, over 16360.00 frames. ], tot_loss[loss=0.08476, simple_loss=0.1044, pruned_loss=0.02218, audio_tagging_loss=0.01037, over 3057915.78 frames. ], batch size: 59, lr: 6.35e-03, grad_scale: 32.0 2023-11-19 22:18:37,518 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=827606.6666666666, ans=0.125 2023-11-19 22:18:39,673 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.879e+01 8.442e+01 9.156e+01 1.005e+02 1.586e+02, threshold=1.831e+02, percent-clipped=0.0 2023-11-19 22:18:39,817 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 124150 2023-11-19 22:18:41,286 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=827606.6666666666, ans=0.125 2023-11-19 22:19:05,246 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=827740.0, ans=0.0 2023-11-19 22:19:15,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=827806.6666666666, ans=0.125 2023-11-19 22:19:21,100 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=827873.3333333334, ans=0.125 2023-11-19 22:19:22,042 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 3950, loss[loss=0.06933, simple_loss=0.08431, pruned_loss=0.01356, audio_tagging_loss=0.01362, over 16044.00 frames. ], tot_loss[loss=0.08462, simple_loss=0.1043, pruned_loss=0.02205, audio_tagging_loss=0.01043, over 3057260.44 frames. ], batch size: 62, lr: 6.35e-03, grad_scale: 32.0 2023-11-19 22:19:38,267 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=827940.0, ans=0.1 2023-11-19 22:19:44,123 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 124200 2023-11-19 22:19:47,157 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=828006.6666666666, ans=0.125 2023-11-19 22:20:15,308 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.42 vs. limit=15.0 2023-11-19 22:20:19,874 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=828140.0, ans=0.125 2023-11-19 22:20:27,568 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 4000, loss[loss=0.09885, simple_loss=0.1296, pruned_loss=0.02532, audio_tagging_loss=0.008702, over 14902.00 frames. ], tot_loss[loss=0.08485, simple_loss=0.1045, pruned_loss=0.02209, audio_tagging_loss=0.01052, over 3055751.44 frames. ], batch size: 54, lr: 6.35e-03, grad_scale: 32.0 2023-11-19 22:20:45,023 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=828273.3333333334, ans=0.125 2023-11-19 22:20:49,746 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.155e+01 8.189e+01 8.890e+01 9.727e+01 1.231e+02, threshold=1.778e+02, percent-clipped=0.0 2023-11-19 22:20:49,889 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 124250 2023-11-19 22:21:05,772 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=828406.6666666666, ans=15.0 2023-11-19 22:21:31,905 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 4050, loss[loss=0.0971, simple_loss=0.1166, pruned_loss=0.02999, audio_tagging_loss=0.008838, over 14361.00 frames. ], tot_loss[loss=0.08586, simple_loss=0.1057, pruned_loss=0.02253, audio_tagging_loss=0.01045, over 3055219.22 frames. ], batch size: 56, lr: 6.35e-03, grad_scale: 32.0 2023-11-19 22:21:36,204 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 22:21:37,623 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=828540.0, ans=0.1 2023-11-19 22:21:43,084 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.79 vs. limit=15.0 2023-11-19 22:21:47,583 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=828606.6666666666, ans=0.0 2023-11-19 22:21:53,943 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 124300 2023-11-19 22:22:31,746 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=828806.6666666666, ans=0.0 2023-11-19 22:22:36,336 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 4100, loss[loss=0.07615, simple_loss=0.0946, pruned_loss=0.01711, audio_tagging_loss=0.01174, over 15365.00 frames. ], tot_loss[loss=0.08597, simple_loss=0.1059, pruned_loss=0.02257, audio_tagging_loss=0.01044, over 3052495.49 frames. ], batch size: 57, lr: 6.35e-03, grad_scale: 32.0 2023-11-19 22:22:39,193 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=828873.3333333334, ans=0.0 2023-11-19 22:22:43,738 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.74 vs. limit=10.0 2023-11-19 22:22:58,270 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.257e+01 8.196e+01 8.855e+01 9.661e+01 1.383e+02, threshold=1.771e+02, percent-clipped=0.0 2023-11-19 22:22:58,408 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 124350 2023-11-19 22:23:01,238 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=829006.6666666666, ans=0.0 2023-11-19 22:23:15,330 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=829073.3333333334, ans=0.125 2023-11-19 22:23:22,062 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=829073.3333333334, ans=0.2 2023-11-19 22:23:22,085 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 22:23:40,909 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 4150, loss[loss=0.1303, simple_loss=0.1566, pruned_loss=0.04037, audio_tagging_loss=0.01161, over 14913.00 frames. ], tot_loss[loss=0.08635, simple_loss=0.1064, pruned_loss=0.02281, audio_tagging_loss=0.01032, over 3048546.07 frames. ], batch size: 53, lr: 6.35e-03, grad_scale: 32.0 2023-11-19 22:23:44,138 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=829206.6666666666, ans=0.1 2023-11-19 22:23:55,879 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=829273.3333333334, ans=0.125 2023-11-19 22:24:00,987 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.82 vs. limit=15.0 2023-11-19 22:24:02,792 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 124400 2023-11-19 22:24:27,475 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 22:24:31,606 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.80 vs. limit=22.5 2023-11-19 22:24:38,002 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=829473.3333333334, ans=0.0 2023-11-19 22:24:40,462 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=829473.3333333334, ans=0.125 2023-11-19 22:24:45,686 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 4200, loss[loss=0.1015, simple_loss=0.123, pruned_loss=0.02971, audio_tagging_loss=0.01033, over 15716.00 frames. ], tot_loss[loss=0.08515, simple_loss=0.105, pruned_loss=0.02236, audio_tagging_loss=0.01028, over 3050457.54 frames. ], batch size: 57, lr: 6.34e-03, grad_scale: 32.0 2023-11-19 22:24:57,165 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=829606.6666666666, ans=0.1 2023-11-19 22:25:00,045 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=829606.6666666666, ans=0.0 2023-11-19 22:25:07,647 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.364e+01 8.135e+01 8.898e+01 9.525e+01 1.896e+02, threshold=1.780e+02, percent-clipped=1.0 2023-11-19 22:25:07,813 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 124450 2023-11-19 22:25:14,098 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=829673.3333333334, ans=0.125 2023-11-19 22:25:15,754 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.82 vs. limit=15.0 2023-11-19 22:25:17,788 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=829673.3333333334, ans=0.1 2023-11-19 22:25:17,830 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=829673.3333333334, ans=0.0 2023-11-19 22:25:23,434 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=829740.0, ans=10.0 2023-11-19 22:25:50,199 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 4250, loss[loss=0.08423, simple_loss=0.09891, pruned_loss=0.024, audio_tagging_loss=0.01077, over 15019.00 frames. ], tot_loss[loss=0.0854, simple_loss=0.1057, pruned_loss=0.02233, audio_tagging_loss=0.01019, over 3050764.08 frames. ], batch size: 56, lr: 6.34e-03, grad_scale: 16.0 2023-11-19 22:25:52,036 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=829873.3333333334, ans=15.0 2023-11-19 22:26:12,418 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 124500 2023-11-19 22:26:21,382 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 22:26:39,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=830073.3333333334, ans=0.125 2023-11-19 22:26:55,224 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 4300, loss[loss=0.07064, simple_loss=0.08806, pruned_loss=0.01816, audio_tagging_loss=0.008448, over 14458.00 frames. ], tot_loss[loss=0.08547, simple_loss=0.106, pruned_loss=0.02235, audio_tagging_loss=0.0101, over 3043246.65 frames. ], batch size: 55, lr: 6.34e-03, grad_scale: 16.0 2023-11-19 22:27:16,413 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=830273.3333333334, ans=0.1 2023-11-19 22:27:17,360 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 124550 2023-11-19 22:27:17,527 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=830273.3333333334, ans=0.1 2023-11-19 22:27:18,442 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.753e+01 8.093e+01 8.901e+01 9.811e+01 2.323e+02, threshold=1.780e+02, percent-clipped=1.0 2023-11-19 22:27:20,142 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=830340.0, ans=0.2 2023-11-19 22:27:20,466 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.84 vs. limit=22.5 2023-11-19 22:27:33,127 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=830406.6666666666, ans=0.125 2023-11-19 22:27:39,619 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 22:27:54,732 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=830473.3333333334, ans=0.125 2023-11-19 22:27:55,765 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=830473.3333333334, ans=0.125 2023-11-19 22:27:59,297 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 4350, loss[loss=0.08774, simple_loss=0.1093, pruned_loss=0.02325, audio_tagging_loss=0.009857, over 16586.00 frames. ], tot_loss[loss=0.08522, simple_loss=0.1059, pruned_loss=0.02226, audio_tagging_loss=0.01001, over 3042193.32 frames. ], batch size: 63, lr: 6.34e-03, grad_scale: 16.0 2023-11-19 22:28:15,154 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.24 vs. limit=12.0 2023-11-19 22:28:20,729 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 124600 2023-11-19 22:28:51,867 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=830806.6666666666, ans=0.0 2023-11-19 22:28:53,743 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.59 vs. limit=22.5 2023-11-19 22:28:57,865 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=830806.6666666666, ans=0.1 2023-11-19 22:29:03,771 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 4400, loss[loss=0.06382, simple_loss=0.08139, pruned_loss=0.01367, audio_tagging_loss=0.009458, over 16292.00 frames. ], tot_loss[loss=0.0846, simple_loss=0.1053, pruned_loss=0.02196, audio_tagging_loss=0.01, over 3040096.10 frames. ], batch size: 61, lr: 6.34e-03, grad_scale: 32.0 2023-11-19 22:29:26,141 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 124650 2023-11-19 22:29:27,896 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.389e+01 8.577e+01 9.158e+01 9.839e+01 1.465e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-19 22:30:09,319 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 4450, loss[loss=0.08872, simple_loss=0.1197, pruned_loss=0.02039, audio_tagging_loss=0.008464, over 14304.00 frames. ], tot_loss[loss=0.08446, simple_loss=0.1051, pruned_loss=0.02195, audio_tagging_loss=0.009963, over 3037748.34 frames. ], batch size: 53, lr: 6.34e-03, grad_scale: 32.0 2023-11-19 22:30:14,388 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=831206.6666666666, ans=0.125 2023-11-19 22:30:15,761 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=831206.6666666666, ans=0.2 2023-11-19 22:30:31,350 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 124700 2023-11-19 22:31:13,690 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 4500, loss[loss=0.07894, simple_loss=0.1051, pruned_loss=0.01778, audio_tagging_loss=0.008611, over 16248.00 frames. ], tot_loss[loss=0.08413, simple_loss=0.1045, pruned_loss=0.02188, audio_tagging_loss=0.00999, over 3042972.68 frames. ], batch size: 62, lr: 6.34e-03, grad_scale: 32.0 2023-11-19 22:31:35,381 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 124750 2023-11-19 22:31:36,463 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.020e+01 8.327e+01 8.964e+01 9.884e+01 1.189e+02, threshold=1.793e+02, percent-clipped=0.0 2023-11-19 22:32:18,448 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 4550, loss[loss=0.1097, simple_loss=0.1482, pruned_loss=0.02861, audio_tagging_loss=0.006958, over 16092.00 frames. ], tot_loss[loss=0.08392, simple_loss=0.104, pruned_loss=0.02183, audio_tagging_loss=0.01009, over 3040299.17 frames. ], batch size: 58, lr: 6.34e-03, grad_scale: 32.0 2023-11-19 22:32:21,234 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=6.396e-01 2023-11-19 22:32:35,365 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=831940.0, ans=0.1 2023-11-19 22:32:40,918 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 124800 2023-11-19 22:32:51,842 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=832006.6666666666, ans=0.1 2023-11-19 22:32:59,934 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=832073.3333333334, ans=0.125 2023-11-19 22:33:03,611 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=832073.3333333334, ans=0.1 2023-11-19 22:33:08,667 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 22:33:19,848 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.79 vs. limit=15.0 2023-11-19 22:33:24,154 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 4600, loss[loss=0.07994, simple_loss=0.09677, pruned_loss=0.02323, audio_tagging_loss=0.008326, over 15550.00 frames. ], tot_loss[loss=0.08378, simple_loss=0.1035, pruned_loss=0.02181, audio_tagging_loss=0.01021, over 3045323.02 frames. ], batch size: 58, lr: 6.33e-03, grad_scale: 32.0 2023-11-19 22:33:32,427 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=832206.6666666666, ans=0.125 2023-11-19 22:33:41,542 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 22:33:46,699 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 124850 2023-11-19 22:33:47,805 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.870e+01 8.306e+01 8.909e+01 9.778e+01 1.815e+02, threshold=1.782e+02, percent-clipped=2.0 2023-11-19 22:33:56,874 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=832340.0, ans=0.1 2023-11-19 22:34:14,127 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.07 vs. limit=22.5 2023-11-19 22:34:19,813 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=832473.3333333334, ans=0.125 2023-11-19 22:34:29,628 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 4650, loss[loss=0.06463, simple_loss=0.07646, pruned_loss=0.01621, audio_tagging_loss=0.01019, over 14256.00 frames. ], tot_loss[loss=0.08305, simple_loss=0.1023, pruned_loss=0.02151, audio_tagging_loss=0.01037, over 3049888.79 frames. ], batch size: 55, lr: 6.33e-03, grad_scale: 32.0 2023-11-19 22:34:51,377 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 124900 2023-11-19 22:34:54,601 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.37 vs. limit=15.0 2023-11-19 22:35:04,539 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=832673.3333333334, ans=0.0 2023-11-19 22:35:27,350 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=832806.6666666666, ans=0.125 2023-11-19 22:35:32,276 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=832806.6666666666, ans=10.0 2023-11-19 22:35:34,383 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 4700, loss[loss=0.08472, simple_loss=0.1009, pruned_loss=0.02228, audio_tagging_loss=0.01197, over 15182.00 frames. ], tot_loss[loss=0.0839, simple_loss=0.1035, pruned_loss=0.02181, audio_tagging_loss=0.01034, over 3049745.29 frames. ], batch size: 56, lr: 6.33e-03, grad_scale: 32.0 2023-11-19 22:35:48,453 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=832940.0, ans=0.2 2023-11-19 22:35:55,640 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 124950 2023-11-19 22:35:56,677 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.758e+01 8.340e+01 9.099e+01 9.695e+01 1.346e+02, threshold=1.820e+02, percent-clipped=0.0 2023-11-19 22:36:09,272 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=833006.6666666666, ans=0.035 2023-11-19 22:36:10,732 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=833006.6666666666, ans=0.0 2023-11-19 22:36:30,815 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=833140.0, ans=0.125 2023-11-19 22:36:38,648 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 4750, loss[loss=0.105, simple_loss=0.1341, pruned_loss=0.02995, audio_tagging_loss=0.008002, over 15085.00 frames. ], tot_loss[loss=0.0836, simple_loss=0.1031, pruned_loss=0.02166, audio_tagging_loss=0.01038, over 3042538.71 frames. ], batch size: 56, lr: 6.33e-03, grad_scale: 32.0 2023-11-19 22:37:00,277 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 125000 2023-11-19 22:37:14,988 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.96 vs. limit=6.0 2023-11-19 22:37:16,141 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.60 vs. limit=15.0 2023-11-19 22:37:22,909 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=833406.6666666666, ans=0.0 2023-11-19 22:37:37,045 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=833473.3333333334, ans=0.125 2023-11-19 22:37:42,738 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 4800, loss[loss=0.1007, simple_loss=0.1198, pruned_loss=0.03099, audio_tagging_loss=0.009872, over 15343.00 frames. ], tot_loss[loss=0.08468, simple_loss=0.1045, pruned_loss=0.022, audio_tagging_loss=0.01043, over 3048115.81 frames. ], batch size: 56, lr: 6.33e-03, grad_scale: 32.0 2023-11-19 22:37:44,425 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=833540.0, ans=0.025 2023-11-19 22:37:50,602 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=833540.0, ans=0.0 2023-11-19 22:38:00,274 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=833606.6666666666, ans=0.95 2023-11-19 22:38:03,815 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=833606.6666666666, ans=0.0 2023-11-19 22:38:04,924 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 125050 2023-11-19 22:38:07,364 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.959e+01 8.107e+01 9.125e+01 1.005e+02 1.234e+02, threshold=1.825e+02, percent-clipped=0.0 2023-11-19 22:38:16,688 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=833673.3333333334, ans=0.125 2023-11-19 22:38:19,158 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 22:38:22,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=833740.0, ans=0.2 2023-11-19 22:38:37,830 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.93 vs. limit=12.0 2023-11-19 22:38:46,570 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=833873.3333333334, ans=0.2 2023-11-19 22:38:47,591 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 4850, loss[loss=0.07264, simple_loss=0.07482, pruned_loss=0.0203, audio_tagging_loss=0.01492, over 14837.00 frames. ], tot_loss[loss=0.08479, simple_loss=0.1044, pruned_loss=0.02204, audio_tagging_loss=0.01056, over 3050066.89 frames. ], batch size: 59, lr: 6.33e-03, grad_scale: 32.0 2023-11-19 22:38:54,429 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.73 vs. limit=15.0 2023-11-19 22:38:58,789 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=833940.0, ans=0.0 2023-11-19 22:39:01,194 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=833940.0, ans=0.0 2023-11-19 22:39:01,355 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=833940.0, ans=0.0 2023-11-19 22:39:09,128 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 125100 2023-11-19 22:39:46,695 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=834140.0, ans=0.2 2023-11-19 22:39:51,859 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 4900, loss[loss=0.07031, simple_loss=0.08648, pruned_loss=0.01889, audio_tagging_loss=0.00818, over 14881.00 frames. ], tot_loss[loss=0.08412, simple_loss=0.1033, pruned_loss=0.02188, audio_tagging_loss=0.01059, over 3045928.48 frames. ], batch size: 56, lr: 6.33e-03, grad_scale: 32.0 2023-11-19 22:40:13,956 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 125150 2023-11-19 22:40:16,266 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.135e+01 8.432e+01 8.847e+01 9.512e+01 1.221e+02, threshold=1.769e+02, percent-clipped=0.0 2023-11-19 22:40:29,411 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=834406.6666666666, ans=0.125 2023-11-19 22:40:31,667 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=834406.6666666666, ans=0.125 2023-11-19 22:40:50,372 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=834473.3333333334, ans=0.015 2023-11-19 22:40:55,094 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 4950, loss[loss=0.07968, simple_loss=0.08987, pruned_loss=0.02211, audio_tagging_loss=0.01264, over 14351.00 frames. ], tot_loss[loss=0.08342, simple_loss=0.103, pruned_loss=0.02151, audio_tagging_loss=0.01041, over 3044012.32 frames. ], batch size: 54, lr: 6.33e-03, grad_scale: 32.0 2023-11-19 22:41:17,911 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 125200 2023-11-19 22:41:32,421 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=834673.3333333334, ans=0.125 2023-11-19 22:41:37,238 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=834740.0, ans=0.125 2023-11-19 22:41:47,509 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=834806.6666666666, ans=0.125 2023-11-19 22:41:58,065 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=834806.6666666666, ans=0.125 2023-11-19 22:42:00,384 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 5000, loss[loss=0.08185, simple_loss=0.1022, pruned_loss=0.02226, audio_tagging_loss=0.008513, over 16961.00 frames. ], tot_loss[loss=0.08321, simple_loss=0.103, pruned_loss=0.02139, audio_tagging_loss=0.0103, over 3047445.35 frames. ], batch size: 63, lr: 6.32e-03, grad_scale: 32.0 2023-11-19 22:42:19,722 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=834940.0, ans=0.0 2023-11-19 22:42:19,864 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=834940.0, ans=0.1 2023-11-19 22:42:22,129 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 125250 2023-11-19 22:42:24,394 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.897e+01 8.190e+01 8.915e+01 9.653e+01 1.690e+02, threshold=1.783e+02, percent-clipped=0.0 2023-11-19 22:42:28,319 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=835006.6666666666, ans=0.1 2023-11-19 22:42:34,998 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=835006.6666666666, ans=0.0 2023-11-19 22:43:04,423 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 5050, loss[loss=0.07641, simple_loss=0.09511, pruned_loss=0.02087, audio_tagging_loss=0.007994, over 14718.00 frames. ], tot_loss[loss=0.08314, simple_loss=0.103, pruned_loss=0.02144, audio_tagging_loss=0.01019, over 3049545.25 frames. ], batch size: 55, lr: 6.32e-03, grad_scale: 32.0 2023-11-19 22:43:10,202 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=835206.6666666666, ans=0.125 2023-11-19 22:43:15,076 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=835206.6666666666, ans=0.2 2023-11-19 22:43:23,519 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=835273.3333333334, ans=0.2 2023-11-19 22:43:26,331 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 125300 2023-11-19 22:43:45,964 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=835406.6666666666, ans=0.2 2023-11-19 22:44:08,446 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 5100, loss[loss=0.1073, simple_loss=0.1466, pruned_loss=0.02791, audio_tagging_loss=0.006065, over 15671.00 frames. ], tot_loss[loss=0.08305, simple_loss=0.1029, pruned_loss=0.02152, audio_tagging_loss=0.0101, over 3046060.77 frames. ], batch size: 56, lr: 6.32e-03, grad_scale: 32.0 2023-11-19 22:44:10,030 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=835540.0, ans=0.125 2023-11-19 22:44:12,274 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=835540.0, ans=0.2 2023-11-19 22:44:21,731 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=835606.6666666666, ans=0.0 2023-11-19 22:44:31,235 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 125350 2023-11-19 22:44:33,053 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.63 vs. limit=22.5 2023-11-19 22:44:33,608 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.822e+01 8.119e+01 8.827e+01 9.463e+01 1.199e+02, threshold=1.765e+02, percent-clipped=0.0 2023-11-19 22:44:46,359 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.46 vs. limit=15.0 2023-11-19 22:45:04,181 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=835806.6666666666, ans=15.0 2023-11-19 22:45:08,855 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=835806.6666666666, ans=0.125 2023-11-19 22:45:14,066 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 5150, loss[loss=0.06109, simple_loss=0.0751, pruned_loss=0.01158, audio_tagging_loss=0.01196, over 15027.00 frames. ], tot_loss[loss=0.0833, simple_loss=0.103, pruned_loss=0.0217, audio_tagging_loss=0.01011, over 3042525.42 frames. ], batch size: 56, lr: 6.32e-03, grad_scale: 32.0 2023-11-19 22:45:17,111 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.77 vs. limit=15.0 2023-11-19 22:45:28,903 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.27 vs. limit=22.5 2023-11-19 22:45:36,413 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 125400 2023-11-19 22:45:53,539 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=836073.3333333334, ans=0.125 2023-11-19 22:45:54,759 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=836073.3333333334, ans=0.05 2023-11-19 22:45:57,177 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=836073.3333333334, ans=0.125 2023-11-19 22:46:19,180 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 5200, loss[loss=0.06956, simple_loss=0.07872, pruned_loss=0.02067, audio_tagging_loss=0.00953, over 13764.00 frames. ], tot_loss[loss=0.08301, simple_loss=0.1024, pruned_loss=0.02167, audio_tagging_loss=0.01014, over 3036925.60 frames. ], batch size: 56, lr: 6.32e-03, grad_scale: 32.0 2023-11-19 22:46:40,476 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 125450 2023-11-19 22:46:43,467 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.938e+01 8.292e+01 9.057e+01 9.966e+01 1.254e+02, threshold=1.811e+02, percent-clipped=0.0 2023-11-19 22:47:23,305 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 5250, loss[loss=0.09376, simple_loss=0.114, pruned_loss=0.02717, audio_tagging_loss=0.009607, over 15094.00 frames. ], tot_loss[loss=0.08438, simple_loss=0.1044, pruned_loss=0.02217, audio_tagging_loss=0.01001, over 3033007.21 frames. ], batch size: 57, lr: 6.32e-03, grad_scale: 32.0 2023-11-19 22:47:24,840 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=836540.0, ans=0.0 2023-11-19 22:47:26,102 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=836540.0, ans=0.125 2023-11-19 22:47:30,265 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.30 vs. limit=15.0 2023-11-19 22:47:42,241 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=836606.6666666666, ans=0.125 2023-11-19 22:47:43,446 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=836606.6666666666, ans=0.05 2023-11-19 22:47:45,657 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 125500 2023-11-19 22:48:08,039 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=836740.0, ans=0.0 2023-11-19 22:48:28,094 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 5300, loss[loss=0.1132, simple_loss=0.1311, pruned_loss=0.03334, audio_tagging_loss=0.01436, over 15052.00 frames. ], tot_loss[loss=0.08435, simple_loss=0.1043, pruned_loss=0.0222, audio_tagging_loss=0.009986, over 3039242.75 frames. ], batch size: 54, lr: 6.32e-03, grad_scale: 32.0 2023-11-19 22:48:50,364 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 125550 2023-11-19 22:48:53,348 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.677e+01 8.460e+01 9.151e+01 1.090e+02 1.553e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-19 22:49:02,723 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.40 vs. limit=12.0 2023-11-19 22:49:19,051 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=837140.0, ans=0.0 2023-11-19 22:49:26,241 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.18 vs. limit=15.0 2023-11-19 22:49:33,673 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 5350, loss[loss=0.07545, simple_loss=0.09771, pruned_loss=0.01695, audio_tagging_loss=0.009651, over 15518.00 frames. ], tot_loss[loss=0.08379, simple_loss=0.1034, pruned_loss=0.02192, audio_tagging_loss=0.01018, over 3038180.88 frames. ], batch size: 58, lr: 6.32e-03, grad_scale: 32.0 2023-11-19 22:49:38,706 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=837206.6666666666, ans=0.125 2023-11-19 22:49:52,733 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=837273.3333333334, ans=0.125 2023-11-19 22:49:54,234 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.96 vs. limit=12.0 2023-11-19 22:49:54,921 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 125600 2023-11-19 22:50:08,308 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff3.min_abs, batch_count=837340.0, ans=0.2 2023-11-19 22:50:13,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=837406.6666666666, ans=0.025 2023-11-19 22:50:24,618 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.76 vs. limit=15.0 2023-11-19 22:50:37,999 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 5400, loss[loss=0.09249, simple_loss=0.1161, pruned_loss=0.02536, audio_tagging_loss=0.009104, over 14928.00 frames. ], tot_loss[loss=0.08426, simple_loss=0.1041, pruned_loss=0.02197, audio_tagging_loss=0.01022, over 3042852.98 frames. ], batch size: 55, lr: 6.31e-03, grad_scale: 16.0 2023-11-19 22:50:38,229 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=837540.0, ans=0.0 2023-11-19 22:50:43,567 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.13 vs. limit=15.0 2023-11-19 22:50:51,451 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=837606.6666666666, ans=0.0 2023-11-19 22:50:56,643 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=837606.6666666666, ans=22.5 2023-11-19 22:50:58,689 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=837606.6666666666, ans=0.125 2023-11-19 22:51:00,256 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 125650 2023-11-19 22:51:03,915 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.803e+01 8.135e+01 8.658e+01 9.508e+01 1.289e+02, threshold=1.732e+02, percent-clipped=0.0 2023-11-19 22:51:11,609 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=837673.3333333334, ans=0.125 2023-11-19 22:51:22,672 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=837740.0, ans=0.1 2023-11-19 22:51:28,746 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=837806.6666666666, ans=0.125 2023-11-19 22:51:30,128 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=837806.6666666666, ans=0.0 2023-11-19 22:51:36,088 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=837806.6666666666, ans=0.125 2023-11-19 22:51:42,677 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 5450, loss[loss=0.08345, simple_loss=0.1103, pruned_loss=0.02066, audio_tagging_loss=0.007617, over 14860.00 frames. ], tot_loss[loss=0.08469, simple_loss=0.1049, pruned_loss=0.02202, audio_tagging_loss=0.01023, over 3046150.23 frames. ], batch size: 56, lr: 6.31e-03, grad_scale: 16.0 2023-11-19 22:51:54,754 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=837940.0, ans=0.0 2023-11-19 22:51:58,442 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=837940.0, ans=0.0 2023-11-19 22:52:04,554 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 125700 2023-11-19 22:52:17,608 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.31 vs. limit=22.5 2023-11-19 22:52:22,384 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=838073.3333333334, ans=0.0 2023-11-19 22:52:39,409 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=838140.0, ans=0.0 2023-11-19 22:52:47,703 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 5500, loss[loss=0.08379, simple_loss=0.1021, pruned_loss=0.0231, audio_tagging_loss=0.009641, over 14527.00 frames. ], tot_loss[loss=0.08467, simple_loss=0.1047, pruned_loss=0.0221, audio_tagging_loss=0.01024, over 3041743.61 frames. ], batch size: 54, lr: 6.31e-03, grad_scale: 16.0 2023-11-19 22:52:49,353 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=838206.6666666666, ans=0.125 2023-11-19 22:52:51,971 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.54 vs. limit=6.0 2023-11-19 22:52:59,586 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=838273.3333333334, ans=0.1 2023-11-19 22:52:59,660 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=838273.3333333334, ans=0.0 2023-11-19 22:53:09,224 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 125750 2023-11-19 22:53:12,021 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=838340.0, ans=0.2 2023-11-19 22:53:12,830 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.838e+01 8.496e+01 8.990e+01 9.633e+01 1.229e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-19 22:53:28,268 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.86 vs. limit=15.0 2023-11-19 22:53:30,427 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=838406.6666666666, ans=0.125 2023-11-19 22:53:41,695 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=838473.3333333334, ans=0.0 2023-11-19 22:53:47,751 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=838473.3333333334, ans=0.1 2023-11-19 22:53:52,460 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 5550, loss[loss=0.06823, simple_loss=0.07445, pruned_loss=0.01945, audio_tagging_loss=0.01156, over 15983.00 frames. ], tot_loss[loss=0.08463, simple_loss=0.1044, pruned_loss=0.02204, audio_tagging_loss=0.01037, over 3049086.27 frames. ], batch size: 63, lr: 6.31e-03, grad_scale: 16.0 2023-11-19 22:53:52,643 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=838540.0, ans=0.0 2023-11-19 22:54:02,488 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=838540.0, ans=0.0 2023-11-19 22:54:04,990 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=838606.6666666666, ans=0.125 2023-11-19 22:54:14,287 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 125800 2023-11-19 22:54:21,747 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=838673.3333333334, ans=0.0 2023-11-19 22:54:24,413 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=838673.3333333334, ans=0.125 2023-11-19 22:54:46,486 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=838806.6666666666, ans=0.0 2023-11-19 22:54:46,574 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=838806.6666666666, ans=0.0 2023-11-19 22:54:52,563 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=838806.6666666666, ans=0.125 2023-11-19 22:54:57,823 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 5600, loss[loss=0.07928, simple_loss=0.1063, pruned_loss=0.01853, audio_tagging_loss=0.007592, over 16307.00 frames. ], tot_loss[loss=0.08435, simple_loss=0.1042, pruned_loss=0.02188, audio_tagging_loss=0.01037, over 3046544.49 frames. ], batch size: 59, lr: 6.31e-03, grad_scale: 32.0 2023-11-19 22:55:16,602 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.79 vs. limit=15.0 2023-11-19 22:55:19,584 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 125850 2023-11-19 22:55:23,386 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.434e+01 8.075e+01 8.698e+01 9.694e+01 1.274e+02, threshold=1.740e+02, percent-clipped=0.0 2023-11-19 22:55:31,093 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=839006.6666666666, ans=0.0 2023-11-19 22:55:31,374 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.73 vs. limit=15.0 2023-11-19 22:55:46,048 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 22:56:02,503 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 5650, loss[loss=0.06082, simple_loss=0.07229, pruned_loss=0.01324, audio_tagging_loss=0.01144, over 14536.00 frames. ], tot_loss[loss=0.08445, simple_loss=0.1041, pruned_loss=0.02195, audio_tagging_loss=0.01047, over 3051474.63 frames. ], batch size: 56, lr: 6.31e-03, grad_scale: 32.0 2023-11-19 22:56:24,935 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 125900 2023-11-19 22:56:35,899 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.49 vs. limit=15.0 2023-11-19 22:56:36,547 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=839340.0, ans=0.125 2023-11-19 22:56:37,891 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=839340.0, ans=0.0 2023-11-19 22:56:50,911 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=839406.6666666666, ans=0.1 2023-11-19 22:56:54,327 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=839473.3333333334, ans=0.125 2023-11-19 22:57:06,836 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 5700, loss[loss=0.07622, simple_loss=0.1022, pruned_loss=0.01729, audio_tagging_loss=0.007822, over 15860.00 frames. ], tot_loss[loss=0.08321, simple_loss=0.1023, pruned_loss=0.02153, audio_tagging_loss=0.01054, over 3048593.97 frames. ], batch size: 56, lr: 6.31e-03, grad_scale: 32.0 2023-11-19 22:57:08,220 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=839540.0, ans=0.125 2023-11-19 22:57:12,613 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=839540.0, ans=0.0 2023-11-19 22:57:16,322 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=839540.0, ans=0.125 2023-11-19 22:57:21,390 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=839606.6666666666, ans=0.0 2023-11-19 22:57:29,110 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 125950 2023-11-19 22:57:31,650 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=839673.3333333334, ans=0.125 2023-11-19 22:57:34,477 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.315e+01 7.815e+01 8.868e+01 9.889e+01 1.263e+02, threshold=1.774e+02, percent-clipped=0.0 2023-11-19 22:57:43,504 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=839673.3333333334, ans=0.0 2023-11-19 22:57:50,121 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.99 vs. limit=6.0 2023-11-19 22:57:52,945 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.58 vs. limit=22.5 2023-11-19 22:57:58,633 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=839806.6666666666, ans=0.125 2023-11-19 22:58:08,088 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=839806.6666666666, ans=0.0 2023-11-19 22:58:10,529 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=839873.3333333334, ans=0.1 2023-11-19 22:58:11,320 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 5750, loss[loss=0.09032, simple_loss=0.1168, pruned_loss=0.02452, audio_tagging_loss=0.007407, over 15671.00 frames. ], tot_loss[loss=0.08324, simple_loss=0.1023, pruned_loss=0.02169, audio_tagging_loss=0.01039, over 3043466.42 frames. ], batch size: 58, lr: 6.31e-03, grad_scale: 16.0 2023-11-19 22:58:33,805 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 126000 2023-11-19 22:58:38,248 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=840006.6666666666, ans=0.1 2023-11-19 22:58:40,581 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=840006.6666666666, ans=0.125 2023-11-19 22:59:14,103 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=840140.0, ans=0.1 2023-11-19 22:59:17,524 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 5800, loss[loss=0.07152, simple_loss=0.08626, pruned_loss=0.01744, audio_tagging_loss=0.01095, over 15370.00 frames. ], tot_loss[loss=0.08261, simple_loss=0.1016, pruned_loss=0.02151, audio_tagging_loss=0.01029, over 3041658.94 frames. ], batch size: 58, lr: 6.30e-03, grad_scale: 16.0 2023-11-19 22:59:39,020 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 126050 2023-11-19 22:59:41,841 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=840340.0, ans=0.05 2023-11-19 22:59:43,816 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.582e+01 8.258e+01 8.850e+01 9.872e+01 1.564e+02, threshold=1.770e+02, percent-clipped=0.0 2023-11-19 22:59:57,943 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=840406.6666666666, ans=0.2 2023-11-19 23:00:22,525 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 5850, loss[loss=0.1123, simple_loss=0.1516, pruned_loss=0.02813, audio_tagging_loss=0.008401, over 16122.00 frames. ], tot_loss[loss=0.08324, simple_loss=0.1025, pruned_loss=0.02179, audio_tagging_loss=0.0102, over 3046151.05 frames. ], batch size: 55, lr: 6.30e-03, grad_scale: 16.0 2023-11-19 23:00:44,889 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 126100 2023-11-19 23:00:48,785 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=840673.3333333334, ans=0.05 2023-11-19 23:01:01,700 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=840740.0, ans=0.1 2023-11-19 23:01:15,941 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=840806.6666666666, ans=0.125 2023-11-19 23:01:18,423 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=840806.6666666666, ans=0.125 2023-11-19 23:01:20,797 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.613e-01 2023-11-19 23:01:27,275 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 5900, loss[loss=0.07306, simple_loss=0.09029, pruned_loss=0.0178, audio_tagging_loss=0.01012, over 15069.00 frames. ], tot_loss[loss=0.08277, simple_loss=0.1021, pruned_loss=0.02155, audio_tagging_loss=0.01017, over 3046923.16 frames. ], batch size: 56, lr: 6.30e-03, grad_scale: 16.0 2023-11-19 23:01:36,686 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=840873.3333333334, ans=0.125 2023-11-19 23:01:43,672 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 23:01:49,501 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 126150 2023-11-19 23:01:54,166 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.488e+01 8.354e+01 9.139e+01 9.974e+01 1.416e+02, threshold=1.828e+02, percent-clipped=0.0 2023-11-19 23:02:03,779 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 23:02:32,590 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 5950, loss[loss=0.0675, simple_loss=0.08349, pruned_loss=0.01626, audio_tagging_loss=0.009493, over 15487.00 frames. ], tot_loss[loss=0.08254, simple_loss=0.1022, pruned_loss=0.0213, audio_tagging_loss=0.01013, over 3040955.85 frames. ], batch size: 58, lr: 6.30e-03, grad_scale: 16.0 2023-11-19 23:02:36,575 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 23:02:41,234 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=841206.6666666666, ans=0.04949747468305833 2023-11-19 23:02:53,896 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 126200 2023-11-19 23:03:00,961 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.93 vs. limit=15.0 2023-11-19 23:03:36,322 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 6000, loss[loss=0.08322, simple_loss=0.1097, pruned_loss=0.01922, audio_tagging_loss=0.009164, over 15410.00 frames. ], tot_loss[loss=0.08247, simple_loss=0.1022, pruned_loss=0.0212, audio_tagging_loss=0.01019, over 3040750.61 frames. ], batch size: 57, lr: 6.30e-03, grad_scale: 32.0 2023-11-19 23:03:36,325 INFO [train_asr.py:1285] (0/4) Computing validation loss 2023-11-19 23:04:18,351 INFO [train_asr.py:1294] (0/4) Epoch 11, validation: loss=0.06364, simple_loss=0.05477, pruned_loss=0.006179, audio_tagging_loss=0.03008, over 4681554.00 frames. 2023-11-19 23:04:18,352 INFO [train_asr.py:1295] (0/4) Maximum memory allocated so far is 25925MB 2023-11-19 23:04:20,580 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=841540.0, ans=0.125 2023-11-19 23:04:40,323 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 126250 2023-11-19 23:04:45,163 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.220e+01 8.140e+01 8.896e+01 9.801e+01 1.425e+02, threshold=1.779e+02, percent-clipped=0.0 2023-11-19 23:04:50,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=841673.3333333334, ans=0.125 2023-11-19 23:04:57,092 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=841740.0, ans=0.125 2023-11-19 23:04:58,611 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.27 vs. limit=15.0 2023-11-19 23:05:07,228 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 23:05:17,351 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.69 vs. limit=22.5 2023-11-19 23:05:23,651 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 6050, loss[loss=0.09498, simple_loss=0.1199, pruned_loss=0.02313, audio_tagging_loss=0.01189, over 15817.00 frames. ], tot_loss[loss=0.08256, simple_loss=0.1021, pruned_loss=0.02125, audio_tagging_loss=0.01026, over 3044550.71 frames. ], batch size: 58, lr: 6.30e-03, grad_scale: 32.0 2023-11-19 23:05:23,939 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=841873.3333333334, ans=0.2 2023-11-19 23:05:42,002 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=841940.0, ans=0.0 2023-11-19 23:05:42,081 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=841940.0, ans=0.0 2023-11-19 23:05:45,488 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 126300 2023-11-19 23:05:51,845 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=842006.6666666666, ans=0.125 2023-11-19 23:05:53,391 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.44 vs. limit=22.5 2023-11-19 23:06:16,104 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=842140.0, ans=0.1 2023-11-19 23:06:17,980 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.67 vs. limit=8.0 2023-11-19 23:06:28,641 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 6100, loss[loss=0.1181, simple_loss=0.1485, pruned_loss=0.03524, audio_tagging_loss=0.008616, over 15288.00 frames. ], tot_loss[loss=0.08241, simple_loss=0.1018, pruned_loss=0.0212, audio_tagging_loss=0.0103, over 3053878.53 frames. ], batch size: 54, lr: 6.30e-03, grad_scale: 32.0 2023-11-19 23:06:42,096 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=842273.3333333334, ans=0.2 2023-11-19 23:06:50,062 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 126350 2023-11-19 23:06:55,420 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.987e+01 8.655e+01 9.472e+01 1.043e+02 1.487e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-19 23:07:04,871 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=842340.0, ans=0.0 2023-11-19 23:07:10,113 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=842406.6666666666, ans=0.1 2023-11-19 23:07:13,624 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=842406.6666666666, ans=0.125 2023-11-19 23:07:16,673 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=842406.6666666666, ans=0.125 2023-11-19 23:07:21,091 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.87 vs. limit=22.5 2023-11-19 23:07:25,513 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=842473.3333333334, ans=0.09899494936611666 2023-11-19 23:07:32,567 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 6150, loss[loss=0.103, simple_loss=0.1298, pruned_loss=0.02676, audio_tagging_loss=0.01131, over 14976.00 frames. ], tot_loss[loss=0.0829, simple_loss=0.1026, pruned_loss=0.02128, audio_tagging_loss=0.01029, over 3053045.97 frames. ], batch size: 56, lr: 6.30e-03, grad_scale: 32.0 2023-11-19 23:07:32,771 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=842540.0, ans=0.125 2023-11-19 23:07:45,160 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=842606.6666666666, ans=0.1 2023-11-19 23:07:55,473 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 126400 2023-11-19 23:07:57,219 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=842606.6666666666, ans=0.125 2023-11-19 23:08:17,016 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.57 vs. limit=15.0 2023-11-19 23:08:17,927 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=842740.0, ans=0.125 2023-11-19 23:08:25,875 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=842806.6666666666, ans=0.2 2023-11-19 23:08:39,211 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 6200, loss[loss=0.0591, simple_loss=0.07036, pruned_loss=0.01462, audio_tagging_loss=0.009305, over 15203.00 frames. ], tot_loss[loss=0.08336, simple_loss=0.1031, pruned_loss=0.02148, audio_tagging_loss=0.01035, over 3054207.42 frames. ], batch size: 58, lr: 6.29e-03, grad_scale: 32.0 2023-11-19 23:08:55,301 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=842940.0, ans=10.0 2023-11-19 23:09:00,703 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 126450 2023-11-19 23:09:01,048 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=842940.0, ans=0.0 2023-11-19 23:09:02,267 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=842940.0, ans=10.0 2023-11-19 23:09:05,393 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.030e+01 8.350e+01 8.922e+01 9.859e+01 1.209e+02, threshold=1.784e+02, percent-clipped=0.0 2023-11-19 23:09:42,307 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 6250, loss[loss=0.1117, simple_loss=0.1433, pruned_loss=0.03083, audio_tagging_loss=0.009206, over 15020.00 frames. ], tot_loss[loss=0.0833, simple_loss=0.1028, pruned_loss=0.02148, audio_tagging_loss=0.01045, over 3045374.24 frames. ], batch size: 57, lr: 6.29e-03, grad_scale: 32.0 2023-11-19 23:09:58,517 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.17 vs. limit=6.0 2023-11-19 23:10:04,662 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 126500 2023-11-19 23:10:44,103 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.15 vs. limit=22.5 2023-11-19 23:10:44,954 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=843473.3333333334, ans=0.125 2023-11-19 23:10:47,079 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 6300, loss[loss=0.09383, simple_loss=0.1127, pruned_loss=0.02904, audio_tagging_loss=0.008459, over 14426.00 frames. ], tot_loss[loss=0.08352, simple_loss=0.103, pruned_loss=0.02154, audio_tagging_loss=0.0105, over 3039641.34 frames. ], batch size: 53, lr: 6.29e-03, grad_scale: 32.0 2023-11-19 23:10:52,021 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.15 vs. limit=15.0 2023-11-19 23:10:55,427 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=843540.0, ans=0.0 2023-11-19 23:10:56,749 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=843540.0, ans=0.1 2023-11-19 23:10:59,238 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=843606.6666666666, ans=0.125 2023-11-19 23:11:09,964 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 126550 2023-11-19 23:11:14,953 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.441e+01 8.231e+01 8.988e+01 9.749e+01 1.273e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-19 23:11:29,499 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=843740.0, ans=0.125 2023-11-19 23:11:40,662 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=843806.6666666666, ans=0.1 2023-11-19 23:11:41,136 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.14 vs. limit=6.0 2023-11-19 23:11:50,073 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=843806.6666666666, ans=0.125 2023-11-19 23:11:52,580 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 6350, loss[loss=0.06165, simple_loss=0.0682, pruned_loss=0.0111, audio_tagging_loss=0.01644, over 14696.00 frames. ], tot_loss[loss=0.08326, simple_loss=0.1027, pruned_loss=0.02137, audio_tagging_loss=0.01056, over 3045663.73 frames. ], batch size: 56, lr: 6.29e-03, grad_scale: 32.0 2023-11-19 23:12:07,135 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=843940.0, ans=0.125 2023-11-19 23:12:10,099 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=843940.0, ans=6.0 2023-11-19 23:12:14,925 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 126600 2023-11-19 23:12:32,154 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=844073.3333333334, ans=0.125 2023-11-19 23:12:47,276 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=844140.0, ans=0.0 2023-11-19 23:12:57,577 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 6400, loss[loss=0.08578, simple_loss=0.1017, pruned_loss=0.02525, audio_tagging_loss=0.009688, over 14524.00 frames. ], tot_loss[loss=0.08339, simple_loss=0.1025, pruned_loss=0.02156, audio_tagging_loss=0.01058, over 3042374.14 frames. ], batch size: 55, lr: 6.29e-03, grad_scale: 32.0 2023-11-19 23:13:13,032 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 23:13:13,124 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.460e-01 2023-11-19 23:13:15,515 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=844273.3333333334, ans=0.0 2023-11-19 23:13:16,718 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 23:13:19,173 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 126650 2023-11-19 23:13:25,602 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.681e+01 8.306e+01 8.849e+01 9.605e+01 1.260e+02, threshold=1.770e+02, percent-clipped=0.0 2023-11-19 23:13:35,080 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=844406.6666666666, ans=0.0 2023-11-19 23:13:36,373 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=844406.6666666666, ans=0.2 2023-11-19 23:13:44,417 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=844406.6666666666, ans=0.125 2023-11-19 23:14:01,787 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 6450, loss[loss=0.05724, simple_loss=0.06542, pruned_loss=0.0113, audio_tagging_loss=0.01322, over 14287.00 frames. ], tot_loss[loss=0.08442, simple_loss=0.104, pruned_loss=0.02191, audio_tagging_loss=0.01054, over 3045433.46 frames. ], batch size: 54, lr: 6.29e-03, grad_scale: 32.0 2023-11-19 23:14:13,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=844606.6666666666, ans=0.125 2023-11-19 23:14:13,802 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=844606.6666666666, ans=0.2 2023-11-19 23:14:21,903 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=844606.6666666666, ans=0.0 2023-11-19 23:14:24,058 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 126700 2023-11-19 23:14:33,947 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.83 vs. limit=15.0 2023-11-19 23:14:34,732 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=844673.3333333334, ans=0.0 2023-11-19 23:14:45,175 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 23:14:45,798 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.08 vs. limit=12.0 2023-11-19 23:14:46,481 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=844740.0, ans=0.0 2023-11-19 23:14:55,041 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=844806.6666666666, ans=0.025 2023-11-19 23:15:06,441 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 6500, loss[loss=0.08381, simple_loss=0.09299, pruned_loss=0.0238, audio_tagging_loss=0.01351, over 14540.00 frames. ], tot_loss[loss=0.08342, simple_loss=0.1027, pruned_loss=0.02149, audio_tagging_loss=0.01057, over 3050443.13 frames. ], batch size: 55, lr: 6.29e-03, grad_scale: 32.0 2023-11-19 23:15:24,589 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=844940.0, ans=0.125 2023-11-19 23:15:29,799 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 126750 2023-11-19 23:15:35,914 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.834e+01 8.115e+01 8.787e+01 9.556e+01 1.431e+02, threshold=1.757e+02, percent-clipped=0.0 2023-11-19 23:15:43,731 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=845006.6666666666, ans=0.125 2023-11-19 23:16:05,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=845140.0, ans=0.07 2023-11-19 23:16:12,383 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 6550, loss[loss=0.09305, simple_loss=0.1192, pruned_loss=0.02449, audio_tagging_loss=0.008974, over 15407.00 frames. ], tot_loss[loss=0.08302, simple_loss=0.1026, pruned_loss=0.02125, audio_tagging_loss=0.01047, over 3049743.33 frames. ], batch size: 60, lr: 6.29e-03, grad_scale: 32.0 2023-11-19 23:16:17,389 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=845206.6666666666, ans=0.0 2023-11-19 23:16:34,033 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 126800 2023-11-19 23:16:47,206 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=845340.0, ans=0.125 2023-11-19 23:16:56,687 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=845406.6666666666, ans=0.125 2023-11-19 23:16:59,103 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=845406.6666666666, ans=0.125 2023-11-19 23:17:10,559 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.74 vs. limit=15.0 2023-11-19 23:17:17,387 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 6600, loss[loss=0.07903, simple_loss=0.09835, pruned_loss=0.02072, audio_tagging_loss=0.00913, over 16137.00 frames. ], tot_loss[loss=0.0834, simple_loss=0.1032, pruned_loss=0.02151, audio_tagging_loss=0.01028, over 3045401.33 frames. ], batch size: 60, lr: 6.28e-03, grad_scale: 32.0 2023-11-19 23:17:18,869 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=845540.0, ans=0.0 2023-11-19 23:17:32,151 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=845606.6666666666, ans=0.125 2023-11-19 23:17:40,141 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 126850 2023-11-19 23:17:46,103 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.644e+01 8.323e+01 8.980e+01 9.690e+01 1.359e+02, threshold=1.796e+02, percent-clipped=0.0 2023-11-19 23:18:04,513 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=845740.0, ans=0.015 2023-11-19 23:18:12,074 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=845806.6666666666, ans=0.125 2023-11-19 23:18:14,845 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.59 vs. limit=15.0 2023-11-19 23:18:22,267 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 6650, loss[loss=0.1148, simple_loss=0.1435, pruned_loss=0.03652, audio_tagging_loss=0.006553, over 14857.00 frames. ], tot_loss[loss=0.08354, simple_loss=0.1033, pruned_loss=0.02167, audio_tagging_loss=0.01025, over 3045441.71 frames. ], batch size: 55, lr: 6.28e-03, grad_scale: 32.0 2023-11-19 23:18:30,524 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=845873.3333333334, ans=0.1 2023-11-19 23:18:33,096 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=845873.3333333334, ans=0.125 2023-11-19 23:18:36,751 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=845940.0, ans=0.0 2023-11-19 23:18:43,956 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 126900 2023-11-19 23:18:51,622 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.62 vs. limit=15.0 2023-11-19 23:19:17,301 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=846140.0, ans=0.125 2023-11-19 23:19:17,390 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=846140.0, ans=0.125 2023-11-19 23:19:20,937 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=846140.0, ans=0.125 2023-11-19 23:19:26,487 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.37 vs. limit=6.0 2023-11-19 23:19:26,928 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 6700, loss[loss=0.099, simple_loss=0.1229, pruned_loss=0.02618, audio_tagging_loss=0.01137, over 15198.00 frames. ], tot_loss[loss=0.08389, simple_loss=0.104, pruned_loss=0.0218, audio_tagging_loss=0.01011, over 3036729.18 frames. ], batch size: 58, lr: 6.28e-03, grad_scale: 16.0 2023-11-19 23:19:49,124 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 126950 2023-11-19 23:19:57,063 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.564e+01 8.084e+01 8.667e+01 9.126e+01 1.226e+02, threshold=1.733e+02, percent-clipped=0.0 2023-11-19 23:20:16,424 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=846406.6666666666, ans=0.025 2023-11-19 23:20:32,094 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 6750, loss[loss=0.0766, simple_loss=0.09123, pruned_loss=0.01749, audio_tagging_loss=0.0135, over 15510.00 frames. ], tot_loss[loss=0.08361, simple_loss=0.1039, pruned_loss=0.02163, audio_tagging_loss=0.01002, over 3028504.47 frames. ], batch size: 58, lr: 6.28e-03, grad_scale: 16.0 2023-11-19 23:20:53,327 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 127000 2023-11-19 23:21:04,189 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=846673.3333333334, ans=0.2 2023-11-19 23:21:36,292 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 6800, loss[loss=0.07897, simple_loss=0.09313, pruned_loss=0.01971, audio_tagging_loss=0.01269, over 14622.00 frames. ], tot_loss[loss=0.08338, simple_loss=0.1035, pruned_loss=0.02153, audio_tagging_loss=0.01009, over 3023218.84 frames. ], batch size: 56, lr: 6.28e-03, grad_scale: 32.0 2023-11-19 23:21:45,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=846873.3333333334, ans=0.125 2023-11-19 23:21:50,790 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=846940.0, ans=0.07 2023-11-19 23:21:54,390 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=846940.0, ans=0.1 2023-11-19 23:21:57,892 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 127050 2023-11-19 23:22:05,053 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=847006.6666666666, ans=0.125 2023-11-19 23:22:05,920 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.069e+01 8.284e+01 8.995e+01 1.009e+02 1.556e+02, threshold=1.799e+02, percent-clipped=0.0 2023-11-19 23:22:07,493 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=847006.6666666666, ans=0.0 2023-11-19 23:22:09,200 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.49 vs. limit=15.0 2023-11-19 23:22:23,704 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=847073.3333333334, ans=0.125 2023-11-19 23:22:41,143 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 6850, loss[loss=0.1046, simple_loss=0.1359, pruned_loss=0.02953, audio_tagging_loss=0.007081, over 16159.00 frames. ], tot_loss[loss=0.08328, simple_loss=0.1035, pruned_loss=0.02149, audio_tagging_loss=0.01004, over 3031971.90 frames. ], batch size: 62, lr: 6.28e-03, grad_scale: 32.0 2023-11-19 23:22:49,950 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=847206.6666666666, ans=0.1 2023-11-19 23:23:03,154 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 127100 2023-11-19 23:23:13,842 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=847340.0, ans=0.1 2023-11-19 23:23:38,050 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.68 vs. limit=22.5 2023-11-19 23:23:44,781 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=847540.0, ans=0.125 2023-11-19 23:23:45,612 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 6900, loss[loss=0.0676, simple_loss=0.07588, pruned_loss=0.01672, audio_tagging_loss=0.01294, over 15082.00 frames. ], tot_loss[loss=0.08329, simple_loss=0.1037, pruned_loss=0.02144, audio_tagging_loss=0.009985, over 3039112.68 frames. ], batch size: 58, lr: 6.28e-03, grad_scale: 32.0 2023-11-19 23:23:56,298 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=847540.0, ans=0.035 2023-11-19 23:24:07,962 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 127150 2023-11-19 23:24:16,951 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.980e+01 8.238e+01 8.922e+01 9.730e+01 1.552e+02, threshold=1.784e+02, percent-clipped=0.0 2023-11-19 23:24:19,735 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=847673.3333333334, ans=0.0 2023-11-19 23:24:37,210 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 23:24:38,762 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=847806.6666666666, ans=0.025 2023-11-19 23:24:49,184 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=847873.3333333334, ans=0.125 2023-11-19 23:24:50,539 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 6950, loss[loss=0.07097, simple_loss=0.08554, pruned_loss=0.01452, audio_tagging_loss=0.01368, over 15638.00 frames. ], tot_loss[loss=0.08333, simple_loss=0.1035, pruned_loss=0.02146, audio_tagging_loss=0.01012, over 3043987.68 frames. ], batch size: 58, lr: 6.28e-03, grad_scale: 16.0 2023-11-19 23:24:58,282 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=847873.3333333334, ans=0.125 2023-11-19 23:25:02,712 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=847940.0, ans=0.125 2023-11-19 23:25:12,295 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 127200 2023-11-19 23:25:35,201 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=848073.3333333334, ans=0.0 2023-11-19 23:25:44,197 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=848140.0, ans=0.0 2023-11-19 23:25:50,953 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=848140.0, ans=0.04949747468305833 2023-11-19 23:25:55,016 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=848206.6666666666, ans=0.2 2023-11-19 23:25:55,837 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 7000, loss[loss=0.08955, simple_loss=0.1087, pruned_loss=0.0238, audio_tagging_loss=0.01139, over 15010.00 frames. ], tot_loss[loss=0.08416, simple_loss=0.1045, pruned_loss=0.02176, audio_tagging_loss=0.01018, over 3042818.91 frames. ], batch size: 56, lr: 6.27e-03, grad_scale: 16.0 2023-11-19 23:26:00,884 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=848206.6666666666, ans=0.125 2023-11-19 23:26:08,346 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=848273.3333333334, ans=0.125 2023-11-19 23:26:11,832 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.53 vs. limit=12.0 2023-11-19 23:26:15,058 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=848273.3333333334, ans=0.0 2023-11-19 23:26:17,218 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 127250 2023-11-19 23:26:18,639 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=848273.3333333334, ans=0.1 2023-11-19 23:26:23,451 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=848340.0, ans=0.2 2023-11-19 23:26:26,377 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.925e+01 8.193e+01 9.050e+01 1.000e+02 1.255e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-19 23:26:55,993 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=848473.3333333334, ans=0.125 2023-11-19 23:27:00,068 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 7050, loss[loss=0.05932, simple_loss=0.07041, pruned_loss=0.01261, audio_tagging_loss=0.0115, over 15484.00 frames. ], tot_loss[loss=0.08432, simple_loss=0.1045, pruned_loss=0.02189, audio_tagging_loss=0.01015, over 3048213.92 frames. ], batch size: 60, lr: 6.27e-03, grad_scale: 16.0 2023-11-19 23:27:00,388 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=848540.0, ans=0.125 2023-11-19 23:27:21,623 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.75 vs. limit=15.0 2023-11-19 23:27:22,396 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 127300 2023-11-19 23:27:42,323 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.93 vs. limit=12.0 2023-11-19 23:27:54,818 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.61 vs. limit=22.5 2023-11-19 23:28:03,962 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 7100, loss[loss=0.08783, simple_loss=0.1107, pruned_loss=0.02291, audio_tagging_loss=0.009592, over 14738.00 frames. ], tot_loss[loss=0.08405, simple_loss=0.1041, pruned_loss=0.0218, audio_tagging_loss=0.0102, over 3049343.44 frames. ], batch size: 58, lr: 6.27e-03, grad_scale: 16.0 2023-11-19 23:28:26,305 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 127350 2023-11-19 23:28:34,693 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.672e+01 8.042e+01 8.769e+01 9.719e+01 1.215e+02, threshold=1.754e+02, percent-clipped=0.0 2023-11-19 23:28:40,815 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=849006.6666666666, ans=0.125 2023-11-19 23:28:44,203 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=849073.3333333334, ans=0.1 2023-11-19 23:28:44,359 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=849073.3333333334, ans=0.125 2023-11-19 23:28:51,717 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=849073.3333333334, ans=0.125 2023-11-19 23:29:00,583 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=849140.0, ans=0.1 2023-11-19 23:29:03,319 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.32 vs. limit=22.5 2023-11-19 23:29:09,327 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 7150, loss[loss=0.08556, simple_loss=0.1066, pruned_loss=0.02339, audio_tagging_loss=0.008881, over 13618.00 frames. ], tot_loss[loss=0.08529, simple_loss=0.1059, pruned_loss=0.02218, audio_tagging_loss=0.01015, over 3047191.48 frames. ], batch size: 52, lr: 6.27e-03, grad_scale: 16.0 2023-11-19 23:29:29,624 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=849273.3333333334, ans=0.125 2023-11-19 23:29:30,725 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 127400 2023-11-19 23:29:32,718 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.73 vs. limit=15.0 2023-11-19 23:29:38,943 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.25 vs. limit=15.0 2023-11-19 23:29:56,274 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=849406.6666666666, ans=0.1 2023-11-19 23:30:01,092 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=849473.3333333334, ans=0.125 2023-11-19 23:30:02,397 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=849473.3333333334, ans=0.125 2023-11-19 23:30:03,793 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=849473.3333333334, ans=0.125 2023-11-19 23:30:13,337 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 7200, loss[loss=0.07746, simple_loss=0.1009, pruned_loss=0.01584, audio_tagging_loss=0.01116, over 13491.00 frames. ], tot_loss[loss=0.08535, simple_loss=0.1059, pruned_loss=0.02225, audio_tagging_loss=0.01016, over 3045267.02 frames. ], batch size: 53, lr: 6.27e-03, grad_scale: 32.0 2023-11-19 23:30:28,913 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=849606.6666666666, ans=0.1 2023-11-19 23:30:35,609 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 127450 2023-11-19 23:30:45,311 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.244e+01 8.591e+01 9.847e+01 1.111e+02 1.455e+02, threshold=1.969e+02, percent-clipped=0.0 2023-11-19 23:31:18,090 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 7250, loss[loss=0.0915, simple_loss=0.1114, pruned_loss=0.02521, audio_tagging_loss=0.01059, over 14931.00 frames. ], tot_loss[loss=0.08455, simple_loss=0.1048, pruned_loss=0.02186, audio_tagging_loss=0.0103, over 3043282.43 frames. ], batch size: 56, lr: 6.27e-03, grad_scale: 32.0 2023-11-19 23:31:22,215 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=849873.3333333334, ans=0.0 2023-11-19 23:31:33,654 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=849940.0, ans=0.04949747468305833 2023-11-19 23:31:37,442 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=849940.0, ans=0.125 2023-11-19 23:31:38,928 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.87 vs. limit=22.5 2023-11-19 23:31:40,813 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 127500 2023-11-19 23:31:42,247 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=849940.0, ans=0.125 2023-11-19 23:31:49,822 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=850006.6666666666, ans=0.2 2023-11-19 23:31:53,189 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=850006.6666666666, ans=0.05 2023-11-19 23:31:55,174 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=850006.6666666666, ans=0.125 2023-11-19 23:32:23,472 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 7300, loss[loss=0.07966, simple_loss=0.09808, pruned_loss=0.01827, audio_tagging_loss=0.01235, over 14806.00 frames. ], tot_loss[loss=0.08432, simple_loss=0.1044, pruned_loss=0.02189, audio_tagging_loss=0.0102, over 3042767.45 frames. ], batch size: 58, lr: 6.27e-03, grad_scale: 32.0 2023-11-19 23:32:24,168 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.05 vs. limit=22.5 2023-11-19 23:32:26,337 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=850206.6666666666, ans=0.125 2023-11-19 23:32:31,116 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=850206.6666666666, ans=0.125 2023-11-19 23:32:45,022 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 127550 2023-11-19 23:32:53,437 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.382e+01 8.309e+01 8.798e+01 9.625e+01 1.232e+02, threshold=1.760e+02, percent-clipped=0.0 2023-11-19 23:33:07,806 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=850406.6666666666, ans=0.1 2023-11-19 23:33:11,059 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=850406.6666666666, ans=0.125 2023-11-19 23:33:20,784 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=10.71 vs. limit=15.0 2023-11-19 23:33:21,512 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=850473.3333333334, ans=10.0 2023-11-19 23:33:27,414 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 7350, loss[loss=0.08129, simple_loss=0.08593, pruned_loss=0.02499, audio_tagging_loss=0.01333, over 15154.00 frames. ], tot_loss[loss=0.08437, simple_loss=0.1047, pruned_loss=0.02195, audio_tagging_loss=0.01004, over 3044904.17 frames. ], batch size: 57, lr: 6.27e-03, grad_scale: 32.0 2023-11-19 23:33:30,378 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=850540.0, ans=0.2 2023-11-19 23:33:48,897 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 127600 2023-11-19 23:33:48,971 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=850606.6666666666, ans=0.2 2023-11-19 23:33:53,287 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=850673.3333333334, ans=0.2 2023-11-19 23:33:59,903 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=850673.3333333334, ans=0.0 2023-11-19 23:34:25,434 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=850806.6666666666, ans=0.0 2023-11-19 23:34:26,670 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=850806.6666666666, ans=0.0 2023-11-19 23:34:28,239 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.03 vs. limit=15.0 2023-11-19 23:34:31,291 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 7400, loss[loss=0.09042, simple_loss=0.117, pruned_loss=0.02118, audio_tagging_loss=0.01076, over 15532.00 frames. ], tot_loss[loss=0.08472, simple_loss=0.1053, pruned_loss=0.02211, audio_tagging_loss=0.009945, over 3046841.29 frames. ], batch size: 55, lr: 6.26e-03, grad_scale: 32.0 2023-11-19 23:34:37,665 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=850873.3333333334, ans=0.0 2023-11-19 23:34:43,761 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.01 vs. limit=12.0 2023-11-19 23:34:54,180 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 127650 2023-11-19 23:34:54,337 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=850940.0, ans=0.1 2023-11-19 23:35:02,496 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.769e+01 8.438e+01 8.975e+01 9.970e+01 1.315e+02, threshold=1.795e+02, percent-clipped=0.0 2023-11-19 23:35:31,009 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=851140.0, ans=0.0 2023-11-19 23:35:35,668 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 7450, loss[loss=0.06778, simple_loss=0.08342, pruned_loss=0.01406, audio_tagging_loss=0.01202, over 15115.00 frames. ], tot_loss[loss=0.08382, simple_loss=0.1041, pruned_loss=0.0218, audio_tagging_loss=0.009961, over 3047527.14 frames. ], batch size: 57, lr: 6.26e-03, grad_scale: 32.0 2023-11-19 23:35:35,917 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=851206.6666666666, ans=0.125 2023-11-19 23:35:38,974 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.030e-01 2023-11-19 23:35:43,560 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.23 vs. limit=15.0 2023-11-19 23:35:46,040 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.86 vs. limit=12.0 2023-11-19 23:35:57,593 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 127700 2023-11-19 23:36:20,518 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=10.53 vs. limit=15.0 2023-11-19 23:36:40,549 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 7500, loss[loss=0.1015, simple_loss=0.1284, pruned_loss=0.02837, audio_tagging_loss=0.00893, over 15661.00 frames. ], tot_loss[loss=0.08375, simple_loss=0.1042, pruned_loss=0.02185, audio_tagging_loss=0.009812, over 3049421.43 frames. ], batch size: 57, lr: 6.26e-03, grad_scale: 32.0 2023-11-19 23:36:57,953 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.56 vs. limit=22.5 2023-11-19 23:37:02,018 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 127750 2023-11-19 23:37:04,642 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=851673.3333333334, ans=0.0 2023-11-19 23:37:11,629 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.65 vs. limit=15.0 2023-11-19 23:37:12,199 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.986e+01 8.268e+01 8.982e+01 9.702e+01 1.380e+02, threshold=1.796e+02, percent-clipped=0.0 2023-11-19 23:37:15,790 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=851673.3333333334, ans=0.125 2023-11-19 23:37:19,358 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=851740.0, ans=0.2 2023-11-19 23:37:40,258 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.84 vs. limit=22.5 2023-11-19 23:37:44,454 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 7550, loss[loss=0.08769, simple_loss=0.1132, pruned_loss=0.02096, audio_tagging_loss=0.01011, over 14678.00 frames. ], tot_loss[loss=0.084, simple_loss=0.1044, pruned_loss=0.02193, audio_tagging_loss=0.009855, over 3047858.82 frames. ], batch size: 55, lr: 6.26e-03, grad_scale: 16.0 2023-11-19 23:38:04,266 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=851940.0, ans=0.0 2023-11-19 23:38:06,328 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 127800 2023-11-19 23:38:22,831 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.76 vs. limit=15.0 2023-11-19 23:38:26,403 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=852073.3333333334, ans=0.0 2023-11-19 23:38:41,125 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=852140.0, ans=0.0 2023-11-19 23:38:48,614 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 7600, loss[loss=0.06798, simple_loss=0.08333, pruned_loss=0.01503, audio_tagging_loss=0.01128, over 15645.00 frames. ], tot_loss[loss=0.08337, simple_loss=0.1033, pruned_loss=0.02178, audio_tagging_loss=0.009923, over 3043377.41 frames. ], batch size: 60, lr: 6.26e-03, grad_scale: 32.0 2023-11-19 23:38:48,976 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=852206.6666666666, ans=0.0 2023-11-19 23:39:02,818 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=852273.3333333334, ans=0.0 2023-11-19 23:39:10,699 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 127850 2023-11-19 23:39:20,829 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.143e+01 8.303e+01 8.868e+01 9.604e+01 1.243e+02, threshold=1.774e+02, percent-clipped=0.0 2023-11-19 23:39:27,419 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=852406.6666666666, ans=0.0 2023-11-19 23:39:39,158 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.97 vs. limit=22.5 2023-11-19 23:39:40,039 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=852473.3333333334, ans=0.0 2023-11-19 23:39:47,854 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=852473.3333333334, ans=0.1 2023-11-19 23:39:52,476 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 7650, loss[loss=0.06642, simple_loss=0.08364, pruned_loss=0.01426, audio_tagging_loss=0.01034, over 15155.00 frames. ], tot_loss[loss=0.08255, simple_loss=0.102, pruned_loss=0.02153, audio_tagging_loss=0.01001, over 3045276.22 frames. ], batch size: 57, lr: 6.26e-03, grad_scale: 32.0 2023-11-19 23:40:14,501 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 127900 2023-11-19 23:40:16,209 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.40 vs. limit=10.0 2023-11-19 23:40:44,019 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=852806.6666666666, ans=0.0 2023-11-19 23:40:57,029 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 7700, loss[loss=0.07114, simple_loss=0.09741, pruned_loss=0.01615, audio_tagging_loss=0.006283, over 14709.00 frames. ], tot_loss[loss=0.08265, simple_loss=0.1025, pruned_loss=0.02144, audio_tagging_loss=0.009986, over 3044600.01 frames. ], batch size: 54, lr: 6.26e-03, grad_scale: 16.0 2023-11-19 23:40:57,422 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=852873.3333333334, ans=0.0 2023-11-19 23:41:19,347 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 127950 2023-11-19 23:41:31,372 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.499e+01 8.407e+01 9.041e+01 9.727e+01 1.362e+02, threshold=1.808e+02, percent-clipped=0.0 2023-11-19 23:41:46,675 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.91 vs. limit=15.0 2023-11-19 23:41:53,592 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=853140.0, ans=0.125 2023-11-19 23:42:01,279 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 7750, loss[loss=0.09366, simple_loss=0.1202, pruned_loss=0.0243, audio_tagging_loss=0.009265, over 15607.00 frames. ], tot_loss[loss=0.08387, simple_loss=0.1041, pruned_loss=0.02182, audio_tagging_loss=0.009984, over 3049899.60 frames. ], batch size: 56, lr: 6.26e-03, grad_scale: 8.0 2023-11-19 23:42:08,157 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=853206.6666666666, ans=0.1 2023-11-19 23:42:16,810 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=853273.3333333334, ans=0.0 2023-11-19 23:42:22,881 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 128000 2023-11-19 23:42:24,397 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-128000.pt 2023-11-19 23:42:39,097 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=853340.0, ans=0.125 2023-11-19 23:42:58,068 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=853473.3333333334, ans=0.2 2023-11-19 23:43:09,399 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 7800, loss[loss=0.08671, simple_loss=0.1037, pruned_loss=0.02508, audio_tagging_loss=0.009792, over 15818.00 frames. ], tot_loss[loss=0.08399, simple_loss=0.1043, pruned_loss=0.02174, audio_tagging_loss=0.01009, over 3054226.21 frames. ], batch size: 59, lr: 6.25e-03, grad_scale: 8.0 2023-11-19 23:43:17,293 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.07 vs. limit=15.0 2023-11-19 23:43:31,543 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 128050 2023-11-19 23:43:39,075 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=853673.3333333334, ans=0.0 2023-11-19 23:43:42,221 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=853673.3333333334, ans=0.0 2023-11-19 23:43:44,195 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.125e+01 8.217e+01 8.897e+01 9.655e+01 1.501e+02, threshold=1.779e+02, percent-clipped=0.0 2023-11-19 23:43:44,578 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=853673.3333333334, ans=0.0 2023-11-19 23:43:51,712 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.39 vs. limit=10.0 2023-11-19 23:43:57,277 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=853740.0, ans=0.125 2023-11-19 23:43:59,837 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 23:44:09,396 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.10 vs. limit=15.0 2023-11-19 23:44:14,124 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 7850, loss[loss=0.09052, simple_loss=0.113, pruned_loss=0.02242, audio_tagging_loss=0.01157, over 16891.00 frames. ], tot_loss[loss=0.08479, simple_loss=0.1053, pruned_loss=0.02205, audio_tagging_loss=0.01009, over 3063436.05 frames. ], batch size: 59, lr: 6.25e-03, grad_scale: 8.0 2023-11-19 23:44:16,737 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=853873.3333333334, ans=0.125 2023-11-19 23:44:35,537 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 128100 2023-11-19 23:44:44,174 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.98 vs. limit=22.5 2023-11-19 23:44:52,596 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.64 vs. limit=10.0 2023-11-19 23:44:53,233 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=854073.3333333334, ans=0.1 2023-11-19 23:45:07,217 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.42 vs. limit=10.0 2023-11-19 23:45:17,510 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 7900, loss[loss=0.0815, simple_loss=0.09548, pruned_loss=0.02112, audio_tagging_loss=0.01264, over 15677.00 frames. ], tot_loss[loss=0.08503, simple_loss=0.1055, pruned_loss=0.02209, audio_tagging_loss=0.01022, over 3060328.80 frames. ], batch size: 59, lr: 6.25e-03, grad_scale: 8.0 2023-11-19 23:45:27,605 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=854206.6666666666, ans=0.1 2023-11-19 23:45:29,793 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=854273.3333333334, ans=0.0 2023-11-19 23:45:35,919 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=854273.3333333334, ans=0.2 2023-11-19 23:45:39,342 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 128150 2023-11-19 23:45:44,935 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.74 vs. limit=15.0 2023-11-19 23:45:52,897 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.807e+01 8.209e+01 8.987e+01 9.607e+01 1.593e+02, threshold=1.797e+02, percent-clipped=0.0 2023-11-19 23:45:53,118 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=854340.0, ans=0.1 2023-11-19 23:46:12,188 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=854473.3333333334, ans=0.0 2023-11-19 23:46:21,246 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=854540.0, ans=0.1 2023-11-19 23:46:22,324 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 7950, loss[loss=0.06978, simple_loss=0.08167, pruned_loss=0.01725, audio_tagging_loss=0.0117, over 15313.00 frames. ], tot_loss[loss=0.08403, simple_loss=0.1039, pruned_loss=0.02165, audio_tagging_loss=0.01043, over 3054163.74 frames. ], batch size: 56, lr: 6.25e-03, grad_scale: 8.0 2023-11-19 23:46:25,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=854540.0, ans=0.1 2023-11-19 23:46:26,336 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=854540.0, ans=0.125 2023-11-19 23:46:26,376 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=854540.0, ans=0.125 2023-11-19 23:46:31,354 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=854540.0, ans=0.0 2023-11-19 23:46:32,558 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=854540.0, ans=0.125 2023-11-19 23:46:36,084 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=854606.6666666666, ans=0.125 2023-11-19 23:46:37,347 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=854606.6666666666, ans=0.125 2023-11-19 23:46:38,902 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 23:46:44,301 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 128200 2023-11-19 23:46:46,163 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=854606.6666666666, ans=0.07 2023-11-19 23:46:59,510 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.71 vs. limit=6.0 2023-11-19 23:47:09,593 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=854740.0, ans=0.0 2023-11-19 23:47:21,617 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=854806.6666666666, ans=0.125 2023-11-19 23:47:26,473 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 8000, loss[loss=0.09896, simple_loss=0.1188, pruned_loss=0.03026, audio_tagging_loss=0.009313, over 14049.00 frames. ], tot_loss[loss=0.08426, simple_loss=0.1042, pruned_loss=0.02165, audio_tagging_loss=0.01051, over 3052445.95 frames. ], batch size: 54, lr: 6.25e-03, grad_scale: 16.0 2023-11-19 23:47:26,749 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=854873.3333333334, ans=0.125 2023-11-19 23:47:29,850 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=854873.3333333334, ans=0.035 2023-11-19 23:47:37,748 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=854873.3333333334, ans=0.2 2023-11-19 23:47:49,110 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 128250 2023-11-19 23:48:01,610 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.413e+01 8.250e+01 9.015e+01 9.647e+01 1.325e+02, threshold=1.803e+02, percent-clipped=0.0 2023-11-19 23:48:31,449 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 8050, loss[loss=0.08285, simple_loss=0.1021, pruned_loss=0.02058, audio_tagging_loss=0.01121, over 15336.00 frames. ], tot_loss[loss=0.08512, simple_loss=0.1052, pruned_loss=0.0221, audio_tagging_loss=0.01043, over 3059125.11 frames. ], batch size: 56, lr: 6.25e-03, grad_scale: 16.0 2023-11-19 23:48:48,789 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=855273.3333333334, ans=0.2 2023-11-19 23:48:53,377 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 128300 2023-11-19 23:49:07,821 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.84 vs. limit=15.0 2023-11-19 23:49:12,767 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.78 vs. limit=22.5 2023-11-19 23:49:35,482 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 8100, loss[loss=0.08497, simple_loss=0.1102, pruned_loss=0.02206, audio_tagging_loss=0.007829, over 14256.00 frames. ], tot_loss[loss=0.0851, simple_loss=0.1052, pruned_loss=0.0222, audio_tagging_loss=0.01031, over 3051906.82 frames. ], batch size: 55, lr: 6.25e-03, grad_scale: 16.0 2023-11-19 23:49:55,659 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=855606.6666666666, ans=0.125 2023-11-19 23:49:56,800 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 128350 2023-11-19 23:50:06,407 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.15 vs. limit=22.5 2023-11-19 23:50:09,348 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.585e+01 8.405e+01 9.063e+01 9.959e+01 1.355e+02, threshold=1.813e+02, percent-clipped=0.0 2023-11-19 23:50:21,626 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=855740.0, ans=0.125 2023-11-19 23:50:27,674 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=855806.6666666666, ans=0.125 2023-11-19 23:50:30,089 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=855806.6666666666, ans=0.2 2023-11-19 23:50:38,110 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 8150, loss[loss=0.08334, simple_loss=0.1054, pruned_loss=0.01965, audio_tagging_loss=0.01097, over 15863.00 frames. ], tot_loss[loss=0.08515, simple_loss=0.1057, pruned_loss=0.0222, audio_tagging_loss=0.01011, over 3053003.97 frames. ], batch size: 58, lr: 6.25e-03, grad_scale: 16.0 2023-11-19 23:51:00,907 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 128400 2023-11-19 23:51:03,868 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=856006.6666666666, ans=0.1 2023-11-19 23:51:22,662 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=856073.3333333334, ans=0.125 2023-11-19 23:51:25,616 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.86 vs. limit=22.5 2023-11-19 23:51:28,568 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=856140.0, ans=0.125 2023-11-19 23:51:42,611 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 8200, loss[loss=0.07652, simple_loss=0.09115, pruned_loss=0.02237, audio_tagging_loss=0.008573, over 15702.00 frames. ], tot_loss[loss=0.08501, simple_loss=0.1056, pruned_loss=0.02215, audio_tagging_loss=0.01005, over 3046875.77 frames. ], batch size: 59, lr: 6.24e-03, grad_scale: 16.0 2023-11-19 23:51:45,064 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 23:51:52,687 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=856206.6666666666, ans=0.125 2023-11-19 23:52:05,196 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 128450 2023-11-19 23:52:17,389 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.991e+01 8.248e+01 8.899e+01 9.586e+01 1.451e+02, threshold=1.780e+02, percent-clipped=0.0 2023-11-19 23:52:22,296 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.18 vs. limit=12.0 2023-11-19 23:52:27,088 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=856406.6666666666, ans=0.0 2023-11-19 23:52:48,575 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 8250, loss[loss=0.08079, simple_loss=0.09304, pruned_loss=0.01989, audio_tagging_loss=0.01438, over 14602.00 frames. ], tot_loss[loss=0.08414, simple_loss=0.1047, pruned_loss=0.02178, audio_tagging_loss=0.01002, over 3047950.68 frames. ], batch size: 57, lr: 6.24e-03, grad_scale: 16.0 2023-11-19 23:52:56,079 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=856540.0, ans=0.2 2023-11-19 23:52:57,220 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=856540.0, ans=0.2 2023-11-19 23:53:01,991 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=856606.6666666666, ans=0.0 2023-11-19 23:53:09,873 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 128500 2023-11-19 23:53:29,035 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 23:53:29,231 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=856740.0, ans=0.125 2023-11-19 23:53:51,368 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 8300, loss[loss=0.1177, simple_loss=0.1464, pruned_loss=0.03618, audio_tagging_loss=0.008284, over 15387.00 frames. ], tot_loss[loss=0.08423, simple_loss=0.1046, pruned_loss=0.02191, audio_tagging_loss=0.01004, over 3049534.12 frames. ], batch size: 54, lr: 6.24e-03, grad_scale: 16.0 2023-11-19 23:53:59,566 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=856873.3333333334, ans=0.09899494936611666 2023-11-19 23:54:12,780 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 128550 2023-11-19 23:54:27,195 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=857006.6666666666, ans=0.125 2023-11-19 23:54:27,928 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.671e+01 8.283e+01 8.806e+01 9.666e+01 1.225e+02, threshold=1.761e+02, percent-clipped=0.0 2023-11-19 23:54:35,669 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=857073.3333333334, ans=0.125 2023-11-19 23:54:36,067 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=857073.3333333334, ans=15.0 2023-11-19 23:54:55,411 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 8350, loss[loss=0.06437, simple_loss=0.07979, pruned_loss=0.01309, audio_tagging_loss=0.01139, over 14769.00 frames. ], tot_loss[loss=0.08374, simple_loss=0.1041, pruned_loss=0.02166, audio_tagging_loss=0.01003, over 3051557.80 frames. ], batch size: 57, lr: 6.24e-03, grad_scale: 8.0 2023-11-19 23:54:58,281 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=857206.6666666666, ans=0.125 2023-11-19 23:54:59,292 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=857206.6666666666, ans=0.2 2023-11-19 23:55:12,948 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=857273.3333333334, ans=0.0 2023-11-19 23:55:15,292 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.61 vs. limit=15.0 2023-11-19 23:55:18,365 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 128600 2023-11-19 23:55:20,047 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=857273.3333333334, ans=0.035 2023-11-19 23:55:22,614 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=857340.0, ans=0.0 2023-11-19 23:55:24,939 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=857340.0, ans=0.0 2023-11-19 23:55:29,816 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=857340.0, ans=0.2 2023-11-19 23:55:31,050 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=857340.0, ans=0.2 2023-11-19 23:55:51,079 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=857473.3333333334, ans=0.125 2023-11-19 23:55:57,921 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.04 vs. limit=15.0 2023-11-19 23:55:59,014 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.50 vs. limit=12.0 2023-11-19 23:56:00,795 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 8400, loss[loss=0.07974, simple_loss=0.1064, pruned_loss=0.01907, audio_tagging_loss=0.007471, over 15375.00 frames. ], tot_loss[loss=0.08346, simple_loss=0.1038, pruned_loss=0.02155, audio_tagging_loss=0.01002, over 3048642.47 frames. ], batch size: 56, lr: 6.24e-03, grad_scale: 16.0 2023-11-19 23:56:20,092 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=857606.6666666666, ans=0.1 2023-11-19 23:56:21,781 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.79 vs. limit=6.0 2023-11-19 23:56:22,361 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 128650 2023-11-19 23:56:30,438 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=857673.3333333334, ans=0.0 2023-11-19 23:56:36,150 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.566e+01 8.256e+01 8.921e+01 9.764e+01 1.880e+02, threshold=1.784e+02, percent-clipped=1.0 2023-11-19 23:57:04,625 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 8450, loss[loss=0.07334, simple_loss=0.07981, pruned_loss=0.02298, audio_tagging_loss=0.01045, over 13984.00 frames. ], tot_loss[loss=0.08323, simple_loss=0.1036, pruned_loss=0.02142, audio_tagging_loss=0.01003, over 3043211.45 frames. ], batch size: 55, lr: 6.24e-03, grad_scale: 16.0 2023-11-19 23:57:10,260 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=857873.3333333334, ans=6.0 2023-11-19 23:57:11,737 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.36 vs. limit=22.5 2023-11-19 23:57:15,292 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=857873.3333333334, ans=0.1 2023-11-19 23:57:18,931 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=857940.0, ans=0.0 2023-11-19 23:57:22,603 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=857940.0, ans=0.2 2023-11-19 23:57:22,764 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=857940.0, ans=0.125 2023-11-19 23:57:26,126 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 128700 2023-11-19 23:58:05,773 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=858140.0, ans=0.0 2023-11-19 23:58:08,032 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 8500, loss[loss=0.08151, simple_loss=0.09956, pruned_loss=0.02128, audio_tagging_loss=0.01045, over 15467.00 frames. ], tot_loss[loss=0.0838, simple_loss=0.1036, pruned_loss=0.02182, audio_tagging_loss=0.01016, over 3044707.32 frames. ], batch size: 60, lr: 6.24e-03, grad_scale: 16.0 2023-11-19 23:58:12,208 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.14 vs. limit=15.0 2023-11-19 23:58:22,201 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=858273.3333333334, ans=0.1 2023-11-19 23:58:30,616 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 128750 2023-11-19 23:58:43,803 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.768e+01 8.248e+01 9.044e+01 1.008e+02 1.243e+02, threshold=1.809e+02, percent-clipped=0.0 2023-11-19 23:59:00,879 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=858473.3333333334, ans=0.125 2023-11-19 23:59:07,179 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=858473.3333333334, ans=0.125 2023-11-19 23:59:12,512 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 8550, loss[loss=0.1069, simple_loss=0.1407, pruned_loss=0.03083, audio_tagging_loss=0.005751, over 15477.00 frames. ], tot_loss[loss=0.08402, simple_loss=0.104, pruned_loss=0.02185, audio_tagging_loss=0.01016, over 3046787.09 frames. ], batch size: 57, lr: 6.24e-03, grad_scale: 16.0 2023-11-19 23:59:23,634 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=858540.0, ans=0.125 2023-11-19 23:59:27,460 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=858606.6666666666, ans=0.1 2023-11-19 23:59:32,153 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=858606.6666666666, ans=0.125 2023-11-19 23:59:34,435 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 128800 2023-11-19 23:59:48,928 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=858673.3333333334, ans=0.125 2023-11-20 00:00:01,583 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=858740.0, ans=0.2 2023-11-20 00:00:14,003 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=858806.6666666666, ans=0.1 2023-11-20 00:00:17,222 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 8600, loss[loss=0.07627, simple_loss=0.09666, pruned_loss=0.01746, audio_tagging_loss=0.01048, over 14556.00 frames. ], tot_loss[loss=0.0837, simple_loss=0.1032, pruned_loss=0.02173, audio_tagging_loss=0.01036, over 3043363.96 frames. ], batch size: 55, lr: 6.24e-03, grad_scale: 16.0 2023-11-20 00:00:20,055 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=858873.3333333334, ans=0.125 2023-11-20 00:00:38,580 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 128850 2023-11-20 00:00:45,069 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=859006.6666666666, ans=0.2 2023-11-20 00:00:52,604 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.469e+01 8.239e+01 8.842e+01 9.457e+01 1.153e+02, threshold=1.768e+02, percent-clipped=0.0 2023-11-20 00:01:21,498 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 8650, loss[loss=0.05008, simple_loss=0.06011, pruned_loss=0.00749, audio_tagging_loss=0.01254, over 14678.00 frames. ], tot_loss[loss=0.08387, simple_loss=0.1036, pruned_loss=0.02173, audio_tagging_loss=0.01036, over 3052167.24 frames. ], batch size: 59, lr: 6.23e-03, grad_scale: 16.0 2023-11-20 00:01:38,089 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=859273.3333333334, ans=0.1 2023-11-20 00:01:40,724 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.19 vs. limit=12.0 2023-11-20 00:01:43,244 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 128900 2023-11-20 00:01:43,460 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=859273.3333333334, ans=0.125 2023-11-20 00:01:44,960 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.46 vs. limit=15.0 2023-11-20 00:02:05,817 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 00:02:11,611 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 00:02:21,513 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=859473.3333333334, ans=0.2 2023-11-20 00:02:24,863 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 8700, loss[loss=0.1103, simple_loss=0.1403, pruned_loss=0.03141, audio_tagging_loss=0.008697, over 16714.00 frames. ], tot_loss[loss=0.08372, simple_loss=0.1035, pruned_loss=0.02157, audio_tagging_loss=0.01039, over 3052153.09 frames. ], batch size: 60, lr: 6.23e-03, grad_scale: 16.0 2023-11-20 00:02:38,867 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=859606.6666666666, ans=0.0 2023-11-20 00:02:47,824 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 128950 2023-11-20 00:02:56,817 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.33 vs. limit=15.0 2023-11-20 00:02:59,428 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=859673.3333333334, ans=0.0 2023-11-20 00:03:01,581 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.745e+01 8.243e+01 8.970e+01 9.872e+01 1.298e+02, threshold=1.794e+02, percent-clipped=0.0 2023-11-20 00:03:05,822 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.79 vs. limit=22.5 2023-11-20 00:03:15,769 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=859806.6666666666, ans=0.0 2023-11-20 00:03:22,262 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=859806.6666666666, ans=0.125 2023-11-20 00:03:29,208 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 8750, loss[loss=0.1001, simple_loss=0.1208, pruned_loss=0.03025, audio_tagging_loss=0.009443, over 14496.00 frames. ], tot_loss[loss=0.08493, simple_loss=0.105, pruned_loss=0.02205, audio_tagging_loss=0.01039, over 3062172.59 frames. ], batch size: 55, lr: 6.23e-03, grad_scale: 16.0 2023-11-20 00:03:43,430 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=859940.0, ans=0.125 2023-11-20 00:03:48,994 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=859940.0, ans=0.2 2023-11-20 00:03:51,133 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 129000 2023-11-20 00:03:52,901 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=859940.0, ans=0.125 2023-11-20 00:03:56,498 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=860006.6666666666, ans=0.0 2023-11-20 00:04:23,973 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=860140.0, ans=0.0 2023-11-20 00:04:24,383 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.76 vs. limit=15.0 2023-11-20 00:04:27,471 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=860140.0, ans=0.1 2023-11-20 00:04:33,957 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 8800, loss[loss=0.06666, simple_loss=0.0775, pruned_loss=0.01746, audio_tagging_loss=0.01046, over 16038.00 frames. ], tot_loss[loss=0.08544, simple_loss=0.1057, pruned_loss=0.0222, audio_tagging_loss=0.01042, over 3061318.73 frames. ], batch size: 63, lr: 6.23e-03, grad_scale: 32.0 2023-11-20 00:04:36,656 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=860206.6666666666, ans=0.1 2023-11-20 00:04:47,114 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.10 vs. limit=10.0 2023-11-20 00:04:55,341 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 129050 2023-11-20 00:04:59,806 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=860340.0, ans=0.1 2023-11-20 00:05:09,091 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.191e+01 8.422e+01 9.194e+01 1.008e+02 1.237e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-20 00:05:12,547 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=860406.6666666666, ans=0.125 2023-11-20 00:05:21,072 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=860406.6666666666, ans=0.1 2023-11-20 00:05:32,810 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=860473.3333333334, ans=0.0 2023-11-20 00:05:37,392 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 8850, loss[loss=0.06076, simple_loss=0.07537, pruned_loss=0.01305, audio_tagging_loss=0.01002, over 14893.00 frames. ], tot_loss[loss=0.08547, simple_loss=0.1056, pruned_loss=0.02215, audio_tagging_loss=0.01051, over 3061573.92 frames. ], batch size: 55, lr: 6.23e-03, grad_scale: 32.0 2023-11-20 00:05:45,578 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=860540.0, ans=0.125 2023-11-20 00:05:52,298 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 00:05:59,803 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 129100 2023-11-20 00:06:43,075 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 8900, loss[loss=0.04687, simple_loss=0.05702, pruned_loss=0.008291, audio_tagging_loss=0.01007, over 15369.00 frames. ], tot_loss[loss=0.08536, simple_loss=0.1058, pruned_loss=0.02219, audio_tagging_loss=0.01029, over 3059710.98 frames. ], batch size: 60, lr: 6.23e-03, grad_scale: 32.0 2023-11-20 00:06:48,219 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=860873.3333333334, ans=0.1 2023-11-20 00:07:05,263 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 129150 2023-11-20 00:07:18,684 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.416e+01 8.249e+01 8.942e+01 1.024e+02 1.298e+02, threshold=1.788e+02, percent-clipped=0.0 2023-11-20 00:07:22,167 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=861073.3333333334, ans=0.125 2023-11-20 00:07:36,242 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=861140.0, ans=0.125 2023-11-20 00:07:46,787 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=861206.6666666666, ans=0.1 2023-11-20 00:07:47,620 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 8950, loss[loss=0.08018, simple_loss=0.09179, pruned_loss=0.02578, audio_tagging_loss=0.00851, over 14630.00 frames. ], tot_loss[loss=0.08539, simple_loss=0.106, pruned_loss=0.02226, audio_tagging_loss=0.01014, over 3053210.29 frames. ], batch size: 55, lr: 6.23e-03, grad_scale: 32.0 2023-11-20 00:07:58,448 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.66 vs. limit=22.5 2023-11-20 00:08:09,197 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 129200 2023-11-20 00:08:16,769 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=861340.0, ans=0.0 2023-11-20 00:08:34,782 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=861406.6666666666, ans=0.125 2023-11-20 00:08:37,304 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.71 vs. limit=22.5 2023-11-20 00:08:52,278 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 9000, loss[loss=0.08531, simple_loss=0.1133, pruned_loss=0.02005, audio_tagging_loss=0.008617, over 15735.00 frames. ], tot_loss[loss=0.0851, simple_loss=0.1057, pruned_loss=0.02212, audio_tagging_loss=0.01016, over 3060095.27 frames. ], batch size: 57, lr: 6.23e-03, grad_scale: 32.0 2023-11-20 00:08:52,281 INFO [train_asr.py:1285] (0/4) Computing validation loss 2023-11-20 00:09:31,840 INFO [train_asr.py:1294] (0/4) Epoch 11, validation: loss=0.06425, simple_loss=0.05461, pruned_loss=0.006061, audio_tagging_loss=0.03088, over 4681554.00 frames. 2023-11-20 00:09:31,840 INFO [train_asr.py:1295] (0/4) Maximum memory allocated so far is 25925MB 2023-11-20 00:09:35,734 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=861540.0, ans=0.1 2023-11-20 00:09:53,243 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.55 vs. limit=15.0 2023-11-20 00:09:53,263 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.75 vs. limit=6.0 2023-11-20 00:09:53,908 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 129250 2023-11-20 00:09:57,894 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=861673.3333333334, ans=0.1 2023-11-20 00:10:08,222 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.624e+01 8.204e+01 8.877e+01 9.469e+01 1.301e+02, threshold=1.775e+02, percent-clipped=0.0 2023-11-20 00:10:15,370 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=861740.0, ans=0.0 2023-11-20 00:10:35,645 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 9050, loss[loss=0.1105, simple_loss=0.1303, pruned_loss=0.03878, audio_tagging_loss=0.006526, over 14991.00 frames. ], tot_loss[loss=0.08532, simple_loss=0.1059, pruned_loss=0.02228, audio_tagging_loss=0.0101, over 3054024.30 frames. ], batch size: 55, lr: 6.22e-03, grad_scale: 32.0 2023-11-20 00:10:55,116 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=861940.0, ans=0.125 2023-11-20 00:10:57,863 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 129300 2023-11-20 00:11:14,591 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=862073.3333333334, ans=0.2 2023-11-20 00:11:19,443 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=862073.3333333334, ans=0.0 2023-11-20 00:11:24,188 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=862073.3333333334, ans=0.125 2023-11-20 00:11:39,810 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 9100, loss[loss=0.05993, simple_loss=0.06819, pruned_loss=0.01455, audio_tagging_loss=0.01128, over 15345.00 frames. ], tot_loss[loss=0.08437, simple_loss=0.1048, pruned_loss=0.02194, audio_tagging_loss=0.01004, over 3054335.54 frames. ], batch size: 57, lr: 6.22e-03, grad_scale: 16.0 2023-11-20 00:11:43,933 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=862206.6666666666, ans=10.0 2023-11-20 00:11:48,542 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.14 vs. limit=15.0 2023-11-20 00:12:02,104 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 129350 2023-11-20 00:12:04,165 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.22 vs. limit=15.0 2023-11-20 00:12:15,354 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.99 vs. limit=15.0 2023-11-20 00:12:17,840 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.021e+01 8.243e+01 9.006e+01 9.571e+01 1.391e+02, threshold=1.801e+02, percent-clipped=0.0 2023-11-20 00:12:19,348 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=862406.6666666666, ans=0.0 2023-11-20 00:12:20,631 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=862406.6666666666, ans=0.1 2023-11-20 00:12:20,757 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=862406.6666666666, ans=0.0 2023-11-20 00:12:33,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=862473.3333333334, ans=0.125 2023-11-20 00:12:43,943 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=862540.0, ans=0.125 2023-11-20 00:12:44,997 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 9150, loss[loss=0.1116, simple_loss=0.141, pruned_loss=0.03137, audio_tagging_loss=0.009738, over 14680.00 frames. ], tot_loss[loss=0.08382, simple_loss=0.1038, pruned_loss=0.02189, audio_tagging_loss=0.01004, over 3049152.95 frames. ], batch size: 54, lr: 6.22e-03, grad_scale: 16.0 2023-11-20 00:13:06,345 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 129400 2023-11-20 00:13:29,719 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.27 vs. limit=15.0 2023-11-20 00:13:31,652 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.80 vs. limit=12.0 2023-11-20 00:13:33,380 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=862740.0, ans=0.1 2023-11-20 00:13:35,102 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.30 vs. limit=15.0 2023-11-20 00:13:49,207 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 9200, loss[loss=0.07689, simple_loss=0.09526, pruned_loss=0.0189, audio_tagging_loss=0.01036, over 16202.00 frames. ], tot_loss[loss=0.08344, simple_loss=0.1032, pruned_loss=0.02179, audio_tagging_loss=0.01005, over 3042033.70 frames. ], batch size: 62, lr: 6.22e-03, grad_scale: 32.0 2023-11-20 00:13:56,191 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=862873.3333333334, ans=0.125 2023-11-20 00:14:04,365 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=862940.0, ans=0.125 2023-11-20 00:14:07,234 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.93 vs. limit=22.5 2023-11-20 00:14:11,574 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 129450 2023-11-20 00:14:18,717 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.28 vs. limit=22.5 2023-11-20 00:14:27,341 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.764e+01 8.355e+01 9.081e+01 9.853e+01 1.317e+02, threshold=1.816e+02, percent-clipped=0.0 2023-11-20 00:14:30,706 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.96 vs. limit=15.0 2023-11-20 00:14:33,938 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=863073.3333333334, ans=0.0 2023-11-20 00:14:46,873 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=863140.0, ans=0.125 2023-11-20 00:14:54,720 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 9250, loss[loss=0.1058, simple_loss=0.1493, pruned_loss=0.02447, audio_tagging_loss=0.006689, over 16107.00 frames. ], tot_loss[loss=0.083, simple_loss=0.1028, pruned_loss=0.02161, audio_tagging_loss=0.01001, over 3048430.43 frames. ], batch size: 55, lr: 6.22e-03, grad_scale: 32.0 2023-11-20 00:14:55,081 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=863206.6666666666, ans=0.125 2023-11-20 00:14:56,680 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.32 vs. limit=15.0 2023-11-20 00:15:00,220 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.43 vs. limit=10.0 2023-11-20 00:15:05,369 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=863206.6666666666, ans=0.1 2023-11-20 00:15:14,677 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=863273.3333333334, ans=0.125 2023-11-20 00:15:16,863 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 129500 2023-11-20 00:15:20,033 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.81 vs. limit=12.0 2023-11-20 00:15:42,841 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=863406.6666666666, ans=0.125 2023-11-20 00:15:59,591 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 9300, loss[loss=0.07526, simple_loss=0.08459, pruned_loss=0.01991, audio_tagging_loss=0.01306, over 14650.00 frames. ], tot_loss[loss=0.08286, simple_loss=0.1022, pruned_loss=0.02156, audio_tagging_loss=0.01021, over 3045983.41 frames. ], batch size: 57, lr: 6.22e-03, grad_scale: 32.0 2023-11-20 00:16:06,178 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=863540.0, ans=0.0 2023-11-20 00:16:10,377 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.36 vs. limit=10.0 2023-11-20 00:16:18,401 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=863606.6666666666, ans=0.2 2023-11-20 00:16:19,675 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=863606.6666666666, ans=0.125 2023-11-20 00:16:19,786 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=863606.6666666666, ans=0.1 2023-11-20 00:16:21,308 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 129550 2023-11-20 00:16:26,415 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=863673.3333333334, ans=0.0 2023-11-20 00:16:36,557 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.891e+01 8.083e+01 9.100e+01 9.829e+01 1.304e+02, threshold=1.820e+02, percent-clipped=0.0 2023-11-20 00:16:50,438 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=863806.6666666666, ans=0.2 2023-11-20 00:17:03,678 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 9350, loss[loss=0.05566, simple_loss=0.06047, pruned_loss=0.01146, audio_tagging_loss=0.01396, over 14437.00 frames. ], tot_loss[loss=0.08325, simple_loss=0.1028, pruned_loss=0.0216, audio_tagging_loss=0.01023, over 3045047.16 frames. ], batch size: 56, lr: 6.22e-03, grad_scale: 32.0 2023-11-20 00:17:16,086 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=863940.0, ans=0.125 2023-11-20 00:17:26,556 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 129600 2023-11-20 00:17:28,243 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=863940.0, ans=0.125 2023-11-20 00:17:46,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=864073.3333333334, ans=0.2 2023-11-20 00:17:55,212 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=864140.0, ans=0.0 2023-11-20 00:18:05,814 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=864140.0, ans=0.0 2023-11-20 00:18:09,197 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 9400, loss[loss=0.08601, simple_loss=0.09664, pruned_loss=0.02756, audio_tagging_loss=0.01014, over 14538.00 frames. ], tot_loss[loss=0.08355, simple_loss=0.1032, pruned_loss=0.02162, audio_tagging_loss=0.01033, over 3041521.34 frames. ], batch size: 56, lr: 6.22e-03, grad_scale: 32.0 2023-11-20 00:18:10,767 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=864206.6666666666, ans=0.125 2023-11-20 00:18:32,242 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 129650 2023-11-20 00:18:36,142 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=864340.0, ans=0.125 2023-11-20 00:18:46,746 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.773e+01 8.451e+01 9.430e+01 1.021e+02 1.598e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-20 00:18:55,182 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=864406.6666666666, ans=0.07 2023-11-20 00:19:05,501 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=864473.3333333334, ans=0.1 2023-11-20 00:19:05,611 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=864473.3333333334, ans=0.0 2023-11-20 00:19:09,140 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=864473.3333333334, ans=0.0 2023-11-20 00:19:12,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=864473.3333333334, ans=0.125 2023-11-20 00:19:14,318 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 9450, loss[loss=0.07147, simple_loss=0.09349, pruned_loss=0.01442, audio_tagging_loss=0.0103, over 14648.00 frames. ], tot_loss[loss=0.08309, simple_loss=0.1024, pruned_loss=0.0214, audio_tagging_loss=0.01047, over 3046923.98 frames. ], batch size: 57, lr: 6.21e-03, grad_scale: 32.0 2023-11-20 00:19:14,372 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 00:19:32,306 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=864606.6666666666, ans=0.0 2023-11-20 00:19:35,924 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 129700 2023-11-20 00:19:58,449 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.96 vs. limit=15.0 2023-11-20 00:20:18,818 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 9500, loss[loss=0.08432, simple_loss=0.1175, pruned_loss=0.01639, audio_tagging_loss=0.009185, over 14823.00 frames. ], tot_loss[loss=0.08372, simple_loss=0.1034, pruned_loss=0.02163, audio_tagging_loss=0.01038, over 3051938.85 frames. ], batch size: 55, lr: 6.21e-03, grad_scale: 32.0 2023-11-20 00:20:40,559 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 129750 2023-11-20 00:20:57,186 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.873e+01 8.143e+01 9.051e+01 9.932e+01 1.802e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-20 00:21:06,296 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=865073.3333333334, ans=0.0 2023-11-20 00:21:07,646 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=865073.3333333334, ans=0.05 2023-11-20 00:21:23,872 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 9550, loss[loss=0.0946, simple_loss=0.1102, pruned_loss=0.0258, audio_tagging_loss=0.01372, over 14354.00 frames. ], tot_loss[loss=0.0834, simple_loss=0.1031, pruned_loss=0.02142, audio_tagging_loss=0.01045, over 3049982.99 frames. ], batch size: 58, lr: 6.21e-03, grad_scale: 32.0 2023-11-20 00:21:34,086 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=865206.6666666666, ans=0.2 2023-11-20 00:21:46,645 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 129800 2023-11-20 00:21:48,414 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=865273.3333333334, ans=0.125 2023-11-20 00:22:22,560 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.16 vs. limit=6.0 2023-11-20 00:22:24,640 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=865473.3333333334, ans=0.125 2023-11-20 00:22:29,132 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 9600, loss[loss=0.09642, simple_loss=0.1247, pruned_loss=0.02488, audio_tagging_loss=0.009166, over 15984.00 frames. ], tot_loss[loss=0.0842, simple_loss=0.104, pruned_loss=0.02171, audio_tagging_loss=0.01046, over 3048247.62 frames. ], batch size: 56, lr: 6.21e-03, grad_scale: 32.0 2023-11-20 00:22:33,627 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 00:22:50,665 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 129850 2023-11-20 00:22:56,500 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=865673.3333333334, ans=0.0 2023-11-20 00:23:02,615 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=865673.3333333334, ans=0.0 2023-11-20 00:23:05,965 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.678e+01 8.225e+01 8.966e+01 9.703e+01 1.238e+02, threshold=1.793e+02, percent-clipped=0.0 2023-11-20 00:23:11,777 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.40 vs. limit=15.0 2023-11-20 00:23:33,764 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 9650, loss[loss=0.07006, simple_loss=0.07786, pruned_loss=0.01744, audio_tagging_loss=0.01368, over 14012.00 frames. ], tot_loss[loss=0.08384, simple_loss=0.1035, pruned_loss=0.02159, audio_tagging_loss=0.01048, over 3043149.02 frames. ], batch size: 54, lr: 6.21e-03, grad_scale: 32.0 2023-11-20 00:23:42,831 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=865873.3333333334, ans=0.125 2023-11-20 00:23:55,431 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 129900 2023-11-20 00:24:37,661 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 9700, loss[loss=0.08151, simple_loss=0.09915, pruned_loss=0.02279, audio_tagging_loss=0.009151, over 14417.00 frames. ], tot_loss[loss=0.08412, simple_loss=0.1039, pruned_loss=0.02186, audio_tagging_loss=0.01029, over 3046914.24 frames. ], batch size: 55, lr: 6.21e-03, grad_scale: 32.0 2023-11-20 00:24:47,819 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=866206.6666666666, ans=0.0 2023-11-20 00:24:49,007 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=866273.3333333334, ans=0.2 2023-11-20 00:24:55,397 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=866273.3333333334, ans=0.0 2023-11-20 00:24:59,570 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 129950 2023-11-20 00:25:11,497 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=866340.0, ans=0.125 2023-11-20 00:25:14,938 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.210e+01 8.274e+01 8.934e+01 1.009e+02 1.297e+02, threshold=1.787e+02, percent-clipped=0.0 2023-11-20 00:25:25,907 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=7.957e-03 2023-11-20 00:25:41,621 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 9750, loss[loss=0.09522, simple_loss=0.12, pruned_loss=0.02761, audio_tagging_loss=0.007622, over 15572.00 frames. ], tot_loss[loss=0.08355, simple_loss=0.1033, pruned_loss=0.02179, audio_tagging_loss=0.01009, over 3046960.69 frames. ], batch size: 58, lr: 6.21e-03, grad_scale: 32.0 2023-11-20 00:25:43,352 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.06 vs. limit=15.0 2023-11-20 00:25:49,266 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=866540.0, ans=0.2 2023-11-20 00:26:04,588 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 130000 2023-11-20 00:26:20,857 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=866740.0, ans=10.0 2023-11-20 00:26:25,945 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.46 vs. limit=12.0 2023-11-20 00:26:47,954 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 9800, loss[loss=0.08422, simple_loss=0.09936, pruned_loss=0.02603, audio_tagging_loss=0.008515, over 15845.00 frames. ], tot_loss[loss=0.08296, simple_loss=0.1026, pruned_loss=0.02164, audio_tagging_loss=0.01004, over 3038559.18 frames. ], batch size: 58, lr: 6.21e-03, grad_scale: 16.0 2023-11-20 00:26:53,335 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=866873.3333333334, ans=0.0 2023-11-20 00:27:05,280 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.17 vs. limit=15.0 2023-11-20 00:27:09,797 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 130050 2023-11-20 00:27:10,546 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.76 vs. limit=6.0 2023-11-20 00:27:12,544 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 00:27:14,145 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.10 vs. limit=15.0 2023-11-20 00:27:26,358 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.248e+01 8.368e+01 8.974e+01 9.697e+01 1.703e+02, threshold=1.795e+02, percent-clipped=0.0 2023-11-20 00:27:32,062 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=867073.3333333334, ans=0.125 2023-11-20 00:27:37,817 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=867073.3333333334, ans=0.125 2023-11-20 00:27:47,320 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 00:27:47,595 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=867140.0, ans=0.1 2023-11-20 00:27:52,248 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 9850, loss[loss=0.09564, simple_loss=0.1249, pruned_loss=0.02382, audio_tagging_loss=0.009371, over 15439.00 frames. ], tot_loss[loss=0.08345, simple_loss=0.1035, pruned_loss=0.0217, audio_tagging_loss=0.01001, over 3036045.46 frames. ], batch size: 57, lr: 6.21e-03, grad_scale: 16.0 2023-11-20 00:27:58,127 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=867206.6666666666, ans=0.125 2023-11-20 00:28:00,665 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=867206.6666666666, ans=0.125 2023-11-20 00:28:01,821 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=867206.6666666666, ans=0.05 2023-11-20 00:28:05,458 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=867273.3333333334, ans=0.125 2023-11-20 00:28:14,645 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 130100 2023-11-20 00:28:30,725 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=867406.6666666666, ans=0.0 2023-11-20 00:28:33,307 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=867406.6666666666, ans=0.125 2023-11-20 00:28:33,836 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.55 vs. limit=22.5 2023-11-20 00:28:34,721 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.07 vs. limit=15.0 2023-11-20 00:28:37,030 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 00:28:52,436 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=867473.3333333334, ans=0.0 2023-11-20 00:28:57,192 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 9900, loss[loss=0.08836, simple_loss=0.1091, pruned_loss=0.02409, audio_tagging_loss=0.00973, over 14912.00 frames. ], tot_loss[loss=0.08413, simple_loss=0.1047, pruned_loss=0.02182, audio_tagging_loss=0.009966, over 3039335.34 frames. ], batch size: 58, lr: 6.20e-03, grad_scale: 16.0 2023-11-20 00:29:00,043 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=867540.0, ans=0.125 2023-11-20 00:29:00,487 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.47 vs. limit=15.0 2023-11-20 00:29:01,153 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=867540.0, ans=0.2 2023-11-20 00:29:16,703 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=867606.6666666666, ans=0.125 2023-11-20 00:29:18,143 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=867606.6666666666, ans=0.0 2023-11-20 00:29:20,366 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 130150 2023-11-20 00:29:36,777 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.726e+01 8.221e+01 8.937e+01 9.593e+01 1.338e+02, threshold=1.787e+02, percent-clipped=0.0 2023-11-20 00:29:43,265 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=867740.0, ans=0.125 2023-11-20 00:30:02,309 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 9950, loss[loss=0.06087, simple_loss=0.07115, pruned_loss=0.01126, audio_tagging_loss=0.01404, over 14522.00 frames. ], tot_loss[loss=0.08323, simple_loss=0.1032, pruned_loss=0.02152, audio_tagging_loss=0.01009, over 3033885.76 frames. ], batch size: 55, lr: 6.20e-03, grad_scale: 16.0 2023-11-20 00:30:24,646 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 130200 2023-11-20 00:30:26,795 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.92 vs. limit=22.5 2023-11-20 00:30:51,773 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=868073.3333333334, ans=0.0 2023-11-20 00:31:06,109 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=868206.6666666666, ans=0.09899494936611666 2023-11-20 00:31:07,027 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 10000, loss[loss=0.0862, simple_loss=0.1089, pruned_loss=0.02349, audio_tagging_loss=0.008273, over 15043.00 frames. ], tot_loss[loss=0.08327, simple_loss=0.1034, pruned_loss=0.02159, audio_tagging_loss=0.009988, over 3029370.42 frames. ], batch size: 55, lr: 6.20e-03, grad_scale: 32.0 2023-11-20 00:31:11,691 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=868206.6666666666, ans=0.1 2023-11-20 00:31:14,086 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=868206.6666666666, ans=0.0 2023-11-20 00:31:28,546 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 130250 2023-11-20 00:31:33,245 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=868340.0, ans=0.0 2023-11-20 00:31:44,841 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=868406.6666666666, ans=0.125 2023-11-20 00:31:45,674 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.042e+01 8.141e+01 8.733e+01 9.527e+01 1.222e+02, threshold=1.747e+02, percent-clipped=0.0 2023-11-20 00:32:12,012 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 10050, loss[loss=0.1006, simple_loss=0.1175, pruned_loss=0.03006, audio_tagging_loss=0.01182, over 14681.00 frames. ], tot_loss[loss=0.08325, simple_loss=0.1031, pruned_loss=0.02166, audio_tagging_loss=0.01004, over 3033520.99 frames. ], batch size: 56, lr: 6.20e-03, grad_scale: 32.0 2023-11-20 00:32:17,582 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.23 vs. limit=15.0 2023-11-20 00:32:22,541 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=868540.0, ans=0.125 2023-11-20 00:32:33,897 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 130300 2023-11-20 00:32:35,230 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=868606.6666666666, ans=0.0 2023-11-20 00:32:35,432 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=868606.6666666666, ans=0.2 2023-11-20 00:32:42,089 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=868673.3333333334, ans=0.125 2023-11-20 00:32:44,414 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=868673.3333333334, ans=0.0 2023-11-20 00:32:53,615 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=868740.0, ans=0.0 2023-11-20 00:33:04,433 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.84 vs. limit=15.0 2023-11-20 00:33:16,197 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=868873.3333333334, ans=0.125 2023-11-20 00:33:17,434 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 10100, loss[loss=0.06083, simple_loss=0.07132, pruned_loss=0.01487, audio_tagging_loss=0.0103, over 14601.00 frames. ], tot_loss[loss=0.08385, simple_loss=0.1038, pruned_loss=0.02182, audio_tagging_loss=0.01012, over 3037841.41 frames. ], batch size: 56, lr: 6.20e-03, grad_scale: 32.0 2023-11-20 00:33:19,121 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=868873.3333333334, ans=0.125 2023-11-20 00:33:26,403 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=868873.3333333334, ans=0.1 2023-11-20 00:33:36,773 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=868940.0, ans=0.1 2023-11-20 00:33:39,572 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 130350 2023-11-20 00:33:44,736 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=869006.6666666666, ans=0.125 2023-11-20 00:33:55,271 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.691e+01 8.143e+01 8.992e+01 9.764e+01 1.668e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-20 00:33:55,539 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=869073.3333333334, ans=10.0 2023-11-20 00:33:55,653 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=869073.3333333334, ans=0.0 2023-11-20 00:34:11,661 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 00:34:20,580 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=869206.6666666666, ans=0.125 2023-11-20 00:34:21,583 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 10150, loss[loss=0.08284, simple_loss=0.09976, pruned_loss=0.02101, audio_tagging_loss=0.01196, over 16245.00 frames. ], tot_loss[loss=0.08355, simple_loss=0.1033, pruned_loss=0.02164, audio_tagging_loss=0.01025, over 3045825.80 frames. ], batch size: 61, lr: 6.20e-03, grad_scale: 32.0 2023-11-20 00:34:31,694 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=869206.6666666666, ans=0.2 2023-11-20 00:34:36,496 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=869273.3333333334, ans=0.0 2023-11-20 00:34:38,868 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=869273.3333333334, ans=0.1 2023-11-20 00:34:39,312 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.63 vs. limit=12.0 2023-11-20 00:34:43,466 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 130400 2023-11-20 00:34:48,564 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.04 vs. limit=15.0 2023-11-20 00:34:54,946 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 00:34:58,746 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=869340.0, ans=0.0 2023-11-20 00:35:06,334 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=869406.6666666666, ans=0.1 2023-11-20 00:35:06,368 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=869406.6666666666, ans=0.125 2023-11-20 00:35:24,273 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.33 vs. limit=15.0 2023-11-20 00:35:25,152 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=869473.3333333334, ans=0.0 2023-11-20 00:35:26,393 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=869540.0, ans=0.125 2023-11-20 00:35:27,276 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 10200, loss[loss=0.09615, simple_loss=0.118, pruned_loss=0.02587, audio_tagging_loss=0.0113, over 14392.00 frames. ], tot_loss[loss=0.08261, simple_loss=0.1018, pruned_loss=0.02132, audio_tagging_loss=0.01037, over 3046195.25 frames. ], batch size: 53, lr: 6.20e-03, grad_scale: 32.0 2023-11-20 00:35:37,904 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=869540.0, ans=0.2 2023-11-20 00:35:45,907 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=869606.6666666666, ans=0.0 2023-11-20 00:35:49,329 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 130450 2023-11-20 00:35:54,300 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 00:35:54,803 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.56 vs. limit=15.0 2023-11-20 00:35:55,766 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=869673.3333333334, ans=0.125 2023-11-20 00:36:02,029 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=869673.3333333334, ans=0.0 2023-11-20 00:36:04,022 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=869673.3333333334, ans=0.0 2023-11-20 00:36:06,667 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.786e+01 8.250e+01 8.852e+01 1.003e+02 1.443e+02, threshold=1.770e+02, percent-clipped=0.0 2023-11-20 00:36:19,527 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=869806.6666666666, ans=0.0 2023-11-20 00:36:32,646 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 10250, loss[loss=0.09554, simple_loss=0.1279, pruned_loss=0.02348, audio_tagging_loss=0.008103, over 15437.00 frames. ], tot_loss[loss=0.08266, simple_loss=0.1019, pruned_loss=0.02132, audio_tagging_loss=0.0104, over 3047615.92 frames. ], batch size: 57, lr: 6.20e-03, grad_scale: 32.0 2023-11-20 00:36:34,140 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=869873.3333333334, ans=0.0 2023-11-20 00:36:53,524 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 130500 2023-11-20 00:37:13,067 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 00:37:15,953 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=870073.3333333334, ans=0.0 2023-11-20 00:37:36,721 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 10300, loss[loss=0.05226, simple_loss=0.06024, pruned_loss=0.01198, audio_tagging_loss=0.01015, over 15093.00 frames. ], tot_loss[loss=0.08327, simple_loss=0.1031, pruned_loss=0.02133, audio_tagging_loss=0.01039, over 3044259.75 frames. ], batch size: 57, lr: 6.19e-03, grad_scale: 32.0 2023-11-20 00:37:42,335 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.19 vs. limit=15.0 2023-11-20 00:37:45,117 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=870206.6666666666, ans=0.125 2023-11-20 00:37:57,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=870273.3333333334, ans=0.0 2023-11-20 00:37:59,505 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 130550 2023-11-20 00:38:16,963 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.312e+01 8.334e+01 9.071e+01 9.729e+01 1.396e+02, threshold=1.814e+02, percent-clipped=0.0 2023-11-20 00:38:19,864 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=870406.6666666666, ans=0.125 2023-11-20 00:38:22,572 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.39 vs. limit=10.0 2023-11-20 00:38:35,217 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=870473.3333333334, ans=0.125 2023-11-20 00:38:41,939 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=870540.0, ans=0.0 2023-11-20 00:38:42,749 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 10350, loss[loss=0.06755, simple_loss=0.07422, pruned_loss=0.01789, audio_tagging_loss=0.01256, over 15796.00 frames. ], tot_loss[loss=0.08311, simple_loss=0.1028, pruned_loss=0.0212, audio_tagging_loss=0.01053, over 3048209.59 frames. ], batch size: 60, lr: 6.19e-03, grad_scale: 32.0 2023-11-20 00:38:44,314 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=870540.0, ans=0.2 2023-11-20 00:39:01,634 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=870606.6666666666, ans=0.125 2023-11-20 00:39:02,796 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=870606.6666666666, ans=0.125 2023-11-20 00:39:05,161 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 130600 2023-11-20 00:39:47,660 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 10400, loss[loss=0.09212, simple_loss=0.1233, pruned_loss=0.0231, audio_tagging_loss=0.007357, over 15391.00 frames. ], tot_loss[loss=0.08312, simple_loss=0.1028, pruned_loss=0.02118, audio_tagging_loss=0.01053, over 3044898.07 frames. ], batch size: 57, lr: 6.19e-03, grad_scale: 32.0 2023-11-20 00:40:03,460 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=870940.0, ans=0.125 2023-11-20 00:40:09,411 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 130650 2023-11-20 00:40:24,251 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=871006.6666666666, ans=0.1 2023-11-20 00:40:24,643 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.23 vs. limit=15.0 2023-11-20 00:40:26,386 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.869e+01 8.319e+01 9.012e+01 9.651e+01 1.388e+02, threshold=1.802e+02, percent-clipped=0.0 2023-11-20 00:40:38,891 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=871140.0, ans=0.125 2023-11-20 00:40:42,607 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=871140.0, ans=0.125 2023-11-20 00:40:42,853 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.19 vs. limit=15.0 2023-11-20 00:40:51,820 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.87 vs. limit=10.0 2023-11-20 00:40:52,094 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 10450, loss[loss=0.1271, simple_loss=0.1575, pruned_loss=0.04137, audio_tagging_loss=0.006985, over 16087.00 frames. ], tot_loss[loss=0.08331, simple_loss=0.1029, pruned_loss=0.02135, audio_tagging_loss=0.01049, over 3048578.16 frames. ], batch size: 58, lr: 6.19e-03, grad_scale: 32.0 2023-11-20 00:41:14,136 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 130700 2023-11-20 00:41:15,469 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=871273.3333333334, ans=0.2 2023-11-20 00:41:38,130 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=871406.6666666666, ans=0.0 2023-11-20 00:41:40,595 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=871406.6666666666, ans=0.125 2023-11-20 00:41:56,760 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 10500, loss[loss=0.08843, simple_loss=0.1109, pruned_loss=0.02249, audio_tagging_loss=0.01049, over 14745.00 frames. ], tot_loss[loss=0.08234, simple_loss=0.102, pruned_loss=0.02103, audio_tagging_loss=0.01033, over 3044924.84 frames. ], batch size: 56, lr: 6.19e-03, grad_scale: 32.0 2023-11-20 00:42:02,533 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=871540.0, ans=0.09899494936611666 2023-11-20 00:42:07,396 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=871540.0, ans=0.0 2023-11-20 00:42:19,581 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 130750 2023-11-20 00:42:21,095 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=871606.6666666666, ans=0.1 2023-11-20 00:42:22,363 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=871673.3333333334, ans=0.0 2023-11-20 00:42:26,159 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=871673.3333333334, ans=0.1 2023-11-20 00:42:26,195 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=871673.3333333334, ans=0.125 2023-11-20 00:42:33,600 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=871673.3333333334, ans=0.0 2023-11-20 00:42:35,581 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.944e+01 8.359e+01 9.035e+01 1.000e+02 1.181e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-20 00:42:50,917 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.10 vs. limit=15.0 2023-11-20 00:43:01,014 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=871873.3333333334, ans=0.125 2023-11-20 00:43:01,875 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 10550, loss[loss=0.0794, simple_loss=0.08879, pruned_loss=0.0218, audio_tagging_loss=0.0132, over 15448.00 frames. ], tot_loss[loss=0.08227, simple_loss=0.102, pruned_loss=0.02096, audio_tagging_loss=0.01032, over 3038192.88 frames. ], batch size: 58, lr: 6.19e-03, grad_scale: 32.0 2023-11-20 00:43:02,195 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=871873.3333333334, ans=0.125 2023-11-20 00:43:07,631 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=871873.3333333334, ans=0.125 2023-11-20 00:43:10,353 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.81 vs. limit=22.5 2023-11-20 00:43:13,750 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=871940.0, ans=0.1 2023-11-20 00:43:23,342 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 130800 2023-11-20 00:43:43,572 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=872073.3333333334, ans=0.0 2023-11-20 00:43:50,433 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=872073.3333333334, ans=0.125 2023-11-20 00:43:53,918 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 00:43:59,433 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=872140.0, ans=0.125 2023-11-20 00:44:06,381 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 10600, loss[loss=0.06605, simple_loss=0.06902, pruned_loss=0.01593, audio_tagging_loss=0.0156, over 14679.00 frames. ], tot_loss[loss=0.08263, simple_loss=0.1027, pruned_loss=0.02113, audio_tagging_loss=0.01015, over 3036708.64 frames. ], batch size: 58, lr: 6.19e-03, grad_scale: 16.0 2023-11-20 00:44:12,763 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=872206.6666666666, ans=0.1 2023-11-20 00:44:14,072 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=872206.6666666666, ans=0.125 2023-11-20 00:44:22,099 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=872273.3333333334, ans=0.125 2023-11-20 00:44:24,289 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=872273.3333333334, ans=0.1 2023-11-20 00:44:27,760 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 130850 2023-11-20 00:44:30,548 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=872340.0, ans=0.125 2023-11-20 00:44:32,248 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=872340.0, ans=0.1 2023-11-20 00:44:46,888 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.153e+01 8.252e+01 9.029e+01 9.863e+01 1.267e+02, threshold=1.806e+02, percent-clipped=0.0 2023-11-20 00:45:10,719 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 10650, loss[loss=0.1224, simple_loss=0.1569, pruned_loss=0.03854, audio_tagging_loss=0.005343, over 14012.00 frames. ], tot_loss[loss=0.08293, simple_loss=0.103, pruned_loss=0.02136, audio_tagging_loss=0.01009, over 3037986.78 frames. ], batch size: 52, lr: 6.19e-03, grad_scale: 16.0 2023-11-20 00:45:25,056 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=872606.6666666666, ans=0.125 2023-11-20 00:45:30,875 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.72 vs. limit=15.0 2023-11-20 00:45:32,913 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 130900 2023-11-20 00:45:32,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=872606.6666666666, ans=0.2 2023-11-20 00:45:40,172 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.77 vs. limit=15.0 2023-11-20 00:45:49,268 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=872740.0, ans=0.125 2023-11-20 00:45:51,908 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=872740.0, ans=0.125 2023-11-20 00:46:02,106 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=872806.6666666666, ans=0.125 2023-11-20 00:46:07,391 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.26 vs. limit=22.5 2023-11-20 00:46:14,900 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 10700, loss[loss=0.09184, simple_loss=0.1169, pruned_loss=0.02429, audio_tagging_loss=0.009106, over 15825.00 frames. ], tot_loss[loss=0.083, simple_loss=0.103, pruned_loss=0.0214, audio_tagging_loss=0.01008, over 3035365.88 frames. ], batch size: 56, lr: 6.19e-03, grad_scale: 16.0 2023-11-20 00:46:22,076 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=872873.3333333334, ans=0.2 2023-11-20 00:46:34,957 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=872940.0, ans=0.1 2023-11-20 00:46:37,377 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 130950 2023-11-20 00:46:42,616 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=873006.6666666666, ans=0.125 2023-11-20 00:46:46,219 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=873006.6666666666, ans=0.125 2023-11-20 00:46:47,301 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=873006.6666666666, ans=0.125 2023-11-20 00:46:55,061 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.497e+01 8.315e+01 9.053e+01 9.865e+01 1.273e+02, threshold=1.811e+02, percent-clipped=0.0 2023-11-20 00:47:14,515 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=873140.0, ans=0.1 2023-11-20 00:47:20,598 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 10750, loss[loss=0.09551, simple_loss=0.1215, pruned_loss=0.02505, audio_tagging_loss=0.009712, over 15883.00 frames. ], tot_loss[loss=0.08347, simple_loss=0.1038, pruned_loss=0.02155, audio_tagging_loss=0.01002, over 3037682.40 frames. ], batch size: 56, lr: 6.18e-03, grad_scale: 16.0 2023-11-20 00:47:25,715 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=873206.6666666666, ans=0.125 2023-11-20 00:47:30,904 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.04 vs. limit=15.0 2023-11-20 00:47:41,977 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 131000 2023-11-20 00:48:23,384 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 00:48:24,197 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 10800, loss[loss=0.08389, simple_loss=0.1094, pruned_loss=0.01886, audio_tagging_loss=0.01034, over 14637.00 frames. ], tot_loss[loss=0.08395, simple_loss=0.1046, pruned_loss=0.02176, audio_tagging_loss=0.009905, over 3043381.73 frames. ], batch size: 55, lr: 6.18e-03, grad_scale: 16.0 2023-11-20 00:48:27,974 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.32 vs. limit=15.0 2023-11-20 00:48:28,024 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.30 vs. limit=15.0 2023-11-20 00:48:42,587 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=873606.6666666666, ans=0.125 2023-11-20 00:48:43,209 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.68 vs. limit=6.0 2023-11-20 00:48:46,064 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 131050 2023-11-20 00:48:48,074 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=873606.6666666666, ans=0.0 2023-11-20 00:49:04,887 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.whiten.whitening_limit, batch_count=873740.0, ans=12.0 2023-11-20 00:49:05,239 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.697e+01 8.209e+01 8.933e+01 9.655e+01 1.364e+02, threshold=1.787e+02, percent-clipped=0.0 2023-11-20 00:49:10,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=873740.0, ans=0.2 2023-11-20 00:49:27,004 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=873873.3333333334, ans=0.2 2023-11-20 00:49:27,997 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 10850, loss[loss=0.07091, simple_loss=0.08734, pruned_loss=0.01678, audio_tagging_loss=0.01046, over 15395.00 frames. ], tot_loss[loss=0.0839, simple_loss=0.1046, pruned_loss=0.02166, audio_tagging_loss=0.009931, over 3037789.62 frames. ], batch size: 57, lr: 6.18e-03, grad_scale: 16.0 2023-11-20 00:49:42,194 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=873940.0, ans=0.125 2023-11-20 00:49:50,588 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 131100 2023-11-20 00:50:25,163 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=874140.0, ans=0.0 2023-11-20 00:50:30,940 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=874140.0, ans=0.0 2023-11-20 00:50:32,917 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 10900, loss[loss=0.09152, simple_loss=0.1109, pruned_loss=0.02565, audio_tagging_loss=0.01042, over 15327.00 frames. ], tot_loss[loss=0.08383, simple_loss=0.1046, pruned_loss=0.02163, audio_tagging_loss=0.009895, over 3035648.59 frames. ], batch size: 56, lr: 6.18e-03, grad_scale: 16.0 2023-11-20 00:50:32,956 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 00:50:39,375 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=874206.6666666666, ans=0.125 2023-11-20 00:50:44,693 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=874273.3333333334, ans=0.0 2023-11-20 00:50:54,861 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 131150 2023-11-20 00:51:02,933 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.96 vs. limit=22.5 2023-11-20 00:51:13,346 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.101e+01 8.251e+01 8.753e+01 9.767e+01 1.236e+02, threshold=1.751e+02, percent-clipped=0.0 2023-11-20 00:51:31,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=874473.3333333334, ans=0.2 2023-11-20 00:51:36,267 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 10950, loss[loss=0.08689, simple_loss=0.1076, pruned_loss=0.02355, audio_tagging_loss=0.009525, over 15608.00 frames. ], tot_loss[loss=0.08261, simple_loss=0.1028, pruned_loss=0.02115, audio_tagging_loss=0.01008, over 3033419.71 frames. ], batch size: 59, lr: 6.18e-03, grad_scale: 16.0 2023-11-20 00:51:53,166 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=874606.6666666666, ans=0.0 2023-11-20 00:51:58,409 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 131200 2023-11-20 00:52:03,981 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=874673.3333333334, ans=0.125 2023-11-20 00:52:12,636 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=874673.3333333334, ans=0.0 2023-11-20 00:52:35,550 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=874806.6666666666, ans=0.2 2023-11-20 00:52:41,629 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 11000, loss[loss=0.08171, simple_loss=0.1054, pruned_loss=0.01966, audio_tagging_loss=0.00938, over 14280.00 frames. ], tot_loss[loss=0.08294, simple_loss=0.1028, pruned_loss=0.02128, audio_tagging_loss=0.01023, over 3038697.56 frames. ], batch size: 53, lr: 6.18e-03, grad_scale: 16.0 2023-11-20 00:52:46,961 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=874873.3333333334, ans=0.2 2023-11-20 00:52:56,463 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 00:52:56,760 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=874940.0, ans=0.125 2023-11-20 00:52:58,442 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.76 vs. limit=22.5 2023-11-20 00:53:00,420 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 00:53:04,400 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 131250 2023-11-20 00:53:15,607 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=875006.6666666666, ans=0.125 2023-11-20 00:53:18,299 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=875006.6666666666, ans=0.125 2023-11-20 00:53:21,347 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=875073.3333333334, ans=0.2 2023-11-20 00:53:22,645 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=875073.3333333334, ans=0.0 2023-11-20 00:53:23,520 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.815e+01 8.049e+01 8.869e+01 9.421e+01 1.178e+02, threshold=1.774e+02, percent-clipped=0.0 2023-11-20 00:53:24,094 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.01 vs. limit=12.0 2023-11-20 00:53:33,261 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=875140.0, ans=0.125 2023-11-20 00:53:38,948 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=875140.0, ans=0.1 2023-11-20 00:53:46,583 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 11050, loss[loss=0.0803, simple_loss=0.09807, pruned_loss=0.01875, audio_tagging_loss=0.01252, over 16035.00 frames. ], tot_loss[loss=0.08309, simple_loss=0.103, pruned_loss=0.02124, audio_tagging_loss=0.01034, over 3043313.05 frames. ], batch size: 62, lr: 6.18e-03, grad_scale: 8.0 2023-11-20 00:54:08,901 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 131300 2023-11-20 00:54:15,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=875340.0, ans=0.125 2023-11-20 00:54:29,090 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=875406.6666666666, ans=0.035 2023-11-20 00:54:51,015 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 11100, loss[loss=0.09007, simple_loss=0.1063, pruned_loss=0.02261, audio_tagging_loss=0.01433, over 15565.00 frames. ], tot_loss[loss=0.08332, simple_loss=0.1033, pruned_loss=0.02122, audio_tagging_loss=0.01047, over 3047149.19 frames. ], batch size: 58, lr: 6.18e-03, grad_scale: 8.0 2023-11-20 00:55:12,455 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 131350 2023-11-20 00:55:18,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=875673.3333333334, ans=0.125 2023-11-20 00:55:20,765 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=875673.3333333334, ans=0.05 2023-11-20 00:55:21,860 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=875673.3333333334, ans=0.125 2023-11-20 00:55:33,841 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.519e+01 8.391e+01 9.002e+01 9.833e+01 1.655e+02, threshold=1.800e+02, percent-clipped=0.0 2023-11-20 00:55:37,986 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=875740.0, ans=0.125 2023-11-20 00:55:55,295 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 11150, loss[loss=0.0584, simple_loss=0.06803, pruned_loss=0.01117, audio_tagging_loss=0.01321, over 15228.00 frames. ], tot_loss[loss=0.08365, simple_loss=0.1033, pruned_loss=0.02147, audio_tagging_loss=0.0105, over 3048162.17 frames. ], batch size: 59, lr: 6.17e-03, grad_scale: 8.0 2023-11-20 00:56:07,645 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.45 vs. limit=15.0 2023-11-20 00:56:09,812 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=875940.0, ans=0.1 2023-11-20 00:56:17,354 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 131400 2023-11-20 00:56:17,520 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=875940.0, ans=0.125 2023-11-20 00:56:20,556 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.26 vs. limit=15.0 2023-11-20 00:56:23,394 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=876006.6666666666, ans=0.125 2023-11-20 00:56:29,961 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.20 vs. limit=15.0 2023-11-20 00:56:38,910 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=876073.3333333334, ans=0.0 2023-11-20 00:56:38,982 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=876073.3333333334, ans=0.1 2023-11-20 00:56:45,192 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=876073.3333333334, ans=0.2 2023-11-20 00:56:48,575 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=876140.0, ans=0.0 2023-11-20 00:57:00,000 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 11200, loss[loss=0.06339, simple_loss=0.08071, pruned_loss=0.01256, audio_tagging_loss=0.01048, over 14533.00 frames. ], tot_loss[loss=0.08259, simple_loss=0.1018, pruned_loss=0.02106, audio_tagging_loss=0.01062, over 3053209.31 frames. ], batch size: 54, lr: 6.17e-03, grad_scale: 16.0 2023-11-20 00:57:15,206 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.27 vs. limit=15.0 2023-11-20 00:57:22,039 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 131450 2023-11-20 00:57:42,709 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.046e+01 8.062e+01 8.697e+01 9.573e+01 1.140e+02, threshold=1.739e+02, percent-clipped=0.0 2023-11-20 00:57:44,275 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=876406.6666666666, ans=0.0 2023-11-20 00:58:03,358 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 00:58:04,919 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 11250, loss[loss=0.07496, simple_loss=0.08633, pruned_loss=0.01949, audio_tagging_loss=0.01231, over 15402.00 frames. ], tot_loss[loss=0.08252, simple_loss=0.1015, pruned_loss=0.02112, audio_tagging_loss=0.01065, over 3046467.22 frames. ], batch size: 59, lr: 6.17e-03, grad_scale: 16.0 2023-11-20 00:58:15,764 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=876540.0, ans=0.125 2023-11-20 00:58:26,622 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 131500 2023-11-20 00:58:28,355 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.76 vs. limit=15.0 2023-11-20 00:58:45,348 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=876740.0, ans=0.125 2023-11-20 00:58:48,900 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=876740.0, ans=0.125 2023-11-20 00:59:09,615 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 11300, loss[loss=0.08992, simple_loss=0.1069, pruned_loss=0.02604, audio_tagging_loss=0.01044, over 15441.00 frames. ], tot_loss[loss=0.08327, simple_loss=0.1027, pruned_loss=0.02154, audio_tagging_loss=0.01039, over 3043002.49 frames. ], batch size: 61, lr: 6.17e-03, grad_scale: 16.0 2023-11-20 00:59:12,227 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=876873.3333333334, ans=0.125 2023-11-20 00:59:12,293 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=876873.3333333334, ans=0.1 2023-11-20 00:59:14,500 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=876873.3333333334, ans=0.015 2023-11-20 00:59:16,907 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=876873.3333333334, ans=0.125 2023-11-20 00:59:31,657 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 131550 2023-11-20 00:59:53,572 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.175e+01 8.098e+01 8.659e+01 9.613e+01 1.705e+02, threshold=1.732e+02, percent-clipped=0.0 2023-11-20 00:59:59,226 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.67 vs. limit=6.0 2023-11-20 01:00:14,030 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 11350, loss[loss=0.07282, simple_loss=0.08188, pruned_loss=0.01735, audio_tagging_loss=0.01454, over 14952.00 frames. ], tot_loss[loss=0.08295, simple_loss=0.1025, pruned_loss=0.02148, audio_tagging_loss=0.01023, over 3038883.52 frames. ], batch size: 56, lr: 6.17e-03, grad_scale: 8.0 2023-11-20 01:00:33,541 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=877273.3333333334, ans=0.0 2023-11-20 01:00:35,826 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 131600 2023-11-20 01:00:35,973 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=877273.3333333334, ans=0.125 2023-11-20 01:00:56,653 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.55 vs. limit=22.5 2023-11-20 01:01:18,976 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 11400, loss[loss=0.08843, simple_loss=0.1112, pruned_loss=0.02147, audio_tagging_loss=0.01135, over 15304.00 frames. ], tot_loss[loss=0.08321, simple_loss=0.1031, pruned_loss=0.02159, audio_tagging_loss=0.01009, over 3032987.58 frames. ], batch size: 57, lr: 6.17e-03, grad_scale: 8.0 2023-11-20 01:01:31,191 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=877606.6666666666, ans=0.0 2023-11-20 01:01:40,739 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 131650 2023-11-20 01:01:51,659 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.830e-01 2023-11-20 01:02:02,817 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.051e+01 8.241e+01 9.056e+01 1.011e+02 3.989e+02, threshold=1.811e+02, percent-clipped=1.0 2023-11-20 01:02:23,658 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 11450, loss[loss=0.104, simple_loss=0.1439, pruned_loss=0.02309, audio_tagging_loss=0.008986, over 15276.00 frames. ], tot_loss[loss=0.08325, simple_loss=0.1031, pruned_loss=0.02168, audio_tagging_loss=0.01003, over 3036947.73 frames. ], batch size: 56, lr: 6.17e-03, grad_scale: 8.0 2023-11-20 01:02:45,837 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 131700 2023-11-20 01:02:51,971 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=878006.6666666666, ans=0.0 2023-11-20 01:02:59,651 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.09 vs. limit=22.5 2023-11-20 01:03:04,324 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=878073.3333333334, ans=0.1 2023-11-20 01:03:14,431 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.27 vs. limit=22.5 2023-11-20 01:03:19,149 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=878140.0, ans=0.125 2023-11-20 01:03:24,916 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=878140.0, ans=0.125 2023-11-20 01:03:27,748 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 11500, loss[loss=0.0839, simple_loss=0.1037, pruned_loss=0.02031, audio_tagging_loss=0.01175, over 14219.00 frames. ], tot_loss[loss=0.0827, simple_loss=0.1023, pruned_loss=0.0214, audio_tagging_loss=0.01013, over 3041574.42 frames. ], batch size: 53, lr: 6.17e-03, grad_scale: 8.0 2023-11-20 01:03:28,028 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=878206.6666666666, ans=0.0 2023-11-20 01:03:49,235 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 131750 2023-11-20 01:04:11,076 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.409e+01 8.478e+01 9.308e+01 1.005e+02 1.725e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-20 01:04:12,787 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=878406.6666666666, ans=0.95 2023-11-20 01:04:21,082 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.50 vs. limit=12.0 2023-11-20 01:04:27,309 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=878473.3333333334, ans=0.0 2023-11-20 01:04:29,803 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=878473.3333333334, ans=0.125 2023-11-20 01:04:31,789 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 11550, loss[loss=0.05954, simple_loss=0.07126, pruned_loss=0.0127, audio_tagging_loss=0.01121, over 14599.00 frames. ], tot_loss[loss=0.08275, simple_loss=0.1024, pruned_loss=0.02144, audio_tagging_loss=0.01009, over 3047148.80 frames. ], batch size: 58, lr: 6.17e-03, grad_scale: 8.0 2023-11-20 01:04:54,044 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 131800 2023-11-20 01:04:55,762 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=878606.6666666666, ans=0.2 2023-11-20 01:04:59,548 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=878673.3333333334, ans=0.125 2023-11-20 01:05:07,580 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=878673.3333333334, ans=0.125 2023-11-20 01:05:15,173 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 01:05:19,550 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.92 vs. limit=15.0 2023-11-20 01:05:36,270 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 11600, loss[loss=0.06626, simple_loss=0.08504, pruned_loss=0.01579, audio_tagging_loss=0.007943, over 15913.00 frames. ], tot_loss[loss=0.08289, simple_loss=0.1029, pruned_loss=0.0215, audio_tagging_loss=0.009958, over 3050220.47 frames. ], batch size: 58, lr: 6.16e-03, grad_scale: 16.0 2023-11-20 01:05:58,384 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 131850 2023-11-20 01:05:58,729 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=878940.0, ans=0.125 2023-11-20 01:06:02,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=879006.6666666666, ans=0.125 2023-11-20 01:06:09,072 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=879006.6666666666, ans=0.0 2023-11-20 01:06:19,907 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.853e+01 8.090e+01 8.633e+01 9.438e+01 1.426e+02, threshold=1.727e+02, percent-clipped=0.0 2023-11-20 01:06:30,671 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=879140.0, ans=0.0 2023-11-20 01:06:40,312 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.28 vs. limit=12.0 2023-11-20 01:06:40,883 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 11650, loss[loss=0.06978, simple_loss=0.09023, pruned_loss=0.01334, audio_tagging_loss=0.01132, over 15102.00 frames. ], tot_loss[loss=0.0835, simple_loss=0.1037, pruned_loss=0.02164, audio_tagging_loss=0.01002, over 3054500.02 frames. ], batch size: 57, lr: 6.16e-03, grad_scale: 16.0 2023-11-20 01:07:03,001 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 131900 2023-11-20 01:07:05,643 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=879340.0, ans=0.2 2023-11-20 01:07:09,301 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=879340.0, ans=0.125 2023-11-20 01:07:14,991 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=879340.0, ans=0.125 2023-11-20 01:07:20,213 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.57 vs. limit=15.0 2023-11-20 01:07:21,000 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=879406.6666666666, ans=0.0 2023-11-20 01:07:45,863 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 11700, loss[loss=0.06587, simple_loss=0.07093, pruned_loss=0.01484, audio_tagging_loss=0.01557, over 15114.00 frames. ], tot_loss[loss=0.08325, simple_loss=0.1032, pruned_loss=0.02156, audio_tagging_loss=0.01011, over 3052160.34 frames. ], batch size: 58, lr: 6.16e-03, grad_scale: 16.0 2023-11-20 01:07:51,490 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.64 vs. limit=22.5 2023-11-20 01:08:07,450 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 131950 2023-11-20 01:08:10,435 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.10 vs. limit=15.0 2023-11-20 01:08:30,055 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.705e+01 8.385e+01 9.019e+01 1.002e+02 1.324e+02, threshold=1.804e+02, percent-clipped=0.0 2023-11-20 01:08:45,073 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=879806.6666666666, ans=0.125 2023-11-20 01:08:49,743 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 11750, loss[loss=0.1032, simple_loss=0.1381, pruned_loss=0.02519, audio_tagging_loss=0.008973, over 14215.00 frames. ], tot_loss[loss=0.08393, simple_loss=0.1042, pruned_loss=0.02176, audio_tagging_loss=0.01005, over 3049996.86 frames. ], batch size: 53, lr: 6.16e-03, grad_scale: 16.0 2023-11-20 01:09:10,514 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.00 vs. limit=15.0 2023-11-20 01:09:12,287 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 132000 2023-11-20 01:09:13,728 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-132000.pt 2023-11-20 01:09:45,327 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=880140.0, ans=0.1 2023-11-20 01:09:58,234 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 11800, loss[loss=0.06226, simple_loss=0.08542, pruned_loss=0.01098, audio_tagging_loss=0.008573, over 16201.00 frames. ], tot_loss[loss=0.08333, simple_loss=0.1035, pruned_loss=0.02145, audio_tagging_loss=0.01015, over 3052152.81 frames. ], batch size: 62, lr: 6.16e-03, grad_scale: 16.0 2023-11-20 01:10:11,745 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=880273.3333333334, ans=0.125 2023-11-20 01:10:20,409 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 132050 2023-11-20 01:10:41,407 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.45 vs. limit=15.0 2023-11-20 01:10:41,793 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.087e+01 8.587e+01 9.267e+01 9.920e+01 1.513e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-20 01:10:45,899 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=880406.6666666666, ans=0.0 2023-11-20 01:11:00,364 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.79 vs. limit=22.5 2023-11-20 01:11:02,562 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=880540.0, ans=0.125 2023-11-20 01:11:03,535 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 11850, loss[loss=0.05462, simple_loss=0.06385, pruned_loss=0.01122, audio_tagging_loss=0.01148, over 15217.00 frames. ], tot_loss[loss=0.08364, simple_loss=0.1036, pruned_loss=0.0216, audio_tagging_loss=0.01022, over 3050949.91 frames. ], batch size: 59, lr: 6.16e-03, grad_scale: 16.0 2023-11-20 01:11:06,283 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=880540.0, ans=0.0 2023-11-20 01:11:12,512 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=880540.0, ans=0.125 2023-11-20 01:11:24,977 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 132100 2023-11-20 01:11:43,010 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=880740.0, ans=0.0 2023-11-20 01:11:48,660 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=880740.0, ans=0.125 2023-11-20 01:11:53,861 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.54 vs. limit=22.5 2023-11-20 01:11:59,548 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=880806.6666666666, ans=0.125 2023-11-20 01:12:06,497 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 11900, loss[loss=0.08597, simple_loss=0.1114, pruned_loss=0.01855, audio_tagging_loss=0.01174, over 16209.00 frames. ], tot_loss[loss=0.08312, simple_loss=0.1027, pruned_loss=0.02135, audio_tagging_loss=0.01043, over 3045420.65 frames. ], batch size: 58, lr: 6.16e-03, grad_scale: 16.0 2023-11-20 01:12:18,319 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=880940.0, ans=0.1 2023-11-20 01:12:28,448 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 132150 2023-11-20 01:12:32,156 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.25 vs. limit=15.0 2023-11-20 01:12:34,630 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.86 vs. limit=10.0 2023-11-20 01:12:39,548 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=881006.6666666666, ans=0.125 2023-11-20 01:12:50,266 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.260e+01 8.333e+01 8.992e+01 9.854e+01 1.328e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-20 01:13:10,599 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 11950, loss[loss=0.1006, simple_loss=0.1163, pruned_loss=0.03064, audio_tagging_loss=0.01179, over 14967.00 frames. ], tot_loss[loss=0.08315, simple_loss=0.1027, pruned_loss=0.02134, audio_tagging_loss=0.01047, over 3044661.13 frames. ], batch size: 55, lr: 6.16e-03, grad_scale: 16.0 2023-11-20 01:13:23,529 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.12 vs. limit=12.0 2023-11-20 01:13:33,296 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 132200 2023-11-20 01:13:43,787 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=881340.0, ans=0.1 2023-11-20 01:14:09,294 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=881473.3333333334, ans=0.2 2023-11-20 01:14:11,380 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=881473.3333333334, ans=0.125 2023-11-20 01:14:13,600 INFO [train_asr.py:1262] (0/4) Epoch 11, batch 12000, loss[loss=0.07734, simple_loss=0.09519, pruned_loss=0.01707, audio_tagging_loss=0.01268, over 15336.00 frames. ], tot_loss[loss=0.08298, simple_loss=0.1023, pruned_loss=0.02128, audio_tagging_loss=0.01057, over 3043944.73 frames. ], batch size: 58, lr: 6.15e-03, grad_scale: 32.0 2023-11-20 01:14:13,603 INFO [train_asr.py:1285] (0/4) Computing validation loss 2023-11-20 01:14:36,723 INFO [zipformer.py:1873] (0/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.6268, 3.2620, 2.7706, 3.3617], device='cuda:0') 2023-11-20 01:14:57,674 INFO [train_asr.py:1294] (0/4) Epoch 11, validation: loss=0.06362, simple_loss=0.05468, pruned_loss=0.006127, audio_tagging_loss=0.03015, over 4681554.00 frames. 2023-11-20 01:14:57,675 INFO [train_asr.py:1295] (0/4) Maximum memory allocated so far is 25925MB 2023-11-20 01:15:16,903 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.73 vs. limit=15.0 2023-11-20 01:15:17,674 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 132250 2023-11-20 01:15:25,635 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.44 vs. limit=12.0 2023-11-20 01:15:31,207 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/epoch-11.pt 2023-11-20 01:16:05,095 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 0, loss[loss=0.08492, simple_loss=0.08903, pruned_loss=0.01245, audio_tagging_loss=0.02796, over 15023.00 frames. ], tot_loss[loss=0.08492, simple_loss=0.08903, pruned_loss=0.01245, audio_tagging_loss=0.02796, over 15023.00 frames. ], batch size: 55, lr: 5.90e-03, grad_scale: 32.0 2023-11-20 01:16:05,098 INFO [train_asr.py:1285] (0/4) Computing validation loss 2023-11-20 01:16:34,998 INFO [zipformer.py:1873] (0/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.5036, 2.7234, 3.6965, 3.1572], device='cuda:0') 2023-11-20 01:16:42,307 INFO [train_asr.py:1294] (0/4) Epoch 12, validation: loss=0.06246, simple_loss=0.05467, pruned_loss=0.006079, audio_tagging_loss=0.02904, over 4681554.00 frames. 2023-11-20 01:16:42,308 INFO [train_asr.py:1295] (0/4) Maximum memory allocated so far is 25925MB 2023-11-20 01:16:51,495 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.342e+01 8.202e+01 8.941e+01 9.888e+01 1.289e+02, threshold=1.788e+02, percent-clipped=0.0 2023-11-20 01:16:55,646 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=881786.6666666666, ans=0.95 2023-11-20 01:16:59,211 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=881786.6666666666, ans=0.0 2023-11-20 01:17:31,972 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=881920.0, ans=0.125 2023-11-20 01:17:34,461 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 132300 2023-11-20 01:17:47,172 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 50, loss[loss=0.1037, simple_loss=0.1119, pruned_loss=0.02775, audio_tagging_loss=0.02, over 14837.00 frames. ], tot_loss[loss=0.0912, simple_loss=0.1013, pruned_loss=0.02043, audio_tagging_loss=0.0201, over 683237.06 frames. ], batch size: 56, lr: 5.89e-03, grad_scale: 32.0 2023-11-20 01:17:50,510 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=882053.3333333334, ans=0.0 2023-11-20 01:18:10,184 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=882120.0, ans=0.125 2023-11-20 01:18:30,582 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=882253.3333333334, ans=0.2 2023-11-20 01:18:35,620 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=882253.3333333334, ans=0.1 2023-11-20 01:18:39,742 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 132350 2023-11-20 01:18:41,616 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.31 vs. limit=10.0 2023-11-20 01:18:52,568 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 100, loss[loss=0.1008, simple_loss=0.1245, pruned_loss=0.02661, audio_tagging_loss=0.01196, over 15475.00 frames. ], tot_loss[loss=0.09139, simple_loss=0.103, pruned_loss=0.02083, audio_tagging_loss=0.01905, over 1202270.94 frames. ], batch size: 55, lr: 5.89e-03, grad_scale: 32.0 2023-11-20 01:18:59,518 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=882386.6666666666, ans=0.0 2023-11-20 01:19:01,651 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.733e+01 8.794e+01 9.349e+01 1.020e+02 1.692e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-20 01:19:13,263 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=882453.3333333334, ans=0.05 2023-11-20 01:19:28,566 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=882520.0, ans=0.125 2023-11-20 01:19:30,308 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=882586.6666666666, ans=0.125 2023-11-20 01:19:44,820 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 132400 2023-11-20 01:19:53,048 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.13 vs. limit=15.0 2023-11-20 01:19:57,300 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 150, loss[loss=0.1078, simple_loss=0.1353, pruned_loss=0.03003, audio_tagging_loss=0.01009, over 14913.00 frames. ], tot_loss[loss=0.09007, simple_loss=0.1037, pruned_loss=0.02131, audio_tagging_loss=0.01691, over 1613301.89 frames. ], batch size: 54, lr: 5.89e-03, grad_scale: 32.0 2023-11-20 01:20:22,121 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=882853.3333333334, ans=0.125 2023-11-20 01:20:25,173 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=882853.3333333334, ans=15.0 2023-11-20 01:20:49,219 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 132450 2023-11-20 01:21:00,093 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=882986.6666666666, ans=0.0 2023-11-20 01:21:02,277 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 200, loss[loss=0.0969, simple_loss=0.1186, pruned_loss=0.02808, audio_tagging_loss=0.009503, over 16256.00 frames. ], tot_loss[loss=0.08888, simple_loss=0.1046, pruned_loss=0.02163, audio_tagging_loss=0.01496, over 1937466.46 frames. ], batch size: 59, lr: 5.89e-03, grad_scale: 32.0 2023-11-20 01:21:11,575 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.067e+01 8.201e+01 8.761e+01 9.540e+01 1.328e+02, threshold=1.752e+02, percent-clipped=0.0 2023-11-20 01:21:27,812 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 01:21:48,065 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=883253.3333333334, ans=0.125 2023-11-20 01:21:50,527 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=883253.3333333334, ans=0.125 2023-11-20 01:21:54,063 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 132500 2023-11-20 01:22:06,777 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 250, loss[loss=0.1105, simple_loss=0.1525, pruned_loss=0.02725, audio_tagging_loss=0.007017, over 15861.00 frames. ], tot_loss[loss=0.08701, simple_loss=0.1039, pruned_loss=0.0215, audio_tagging_loss=0.01354, over 2185935.71 frames. ], batch size: 56, lr: 5.89e-03, grad_scale: 32.0 2023-11-20 01:22:16,776 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.44 vs. limit=15.0 2023-11-20 01:22:25,261 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=883453.3333333334, ans=0.0 2023-11-20 01:22:36,856 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=883520.0, ans=0.125 2023-11-20 01:22:49,858 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=883586.6666666666, ans=0.1 2023-11-20 01:22:50,961 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=883586.6666666666, ans=0.125 2023-11-20 01:22:58,128 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 132550 2023-11-20 01:23:11,507 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 300, loss[loss=0.07481, simple_loss=0.08851, pruned_loss=0.01935, audio_tagging_loss=0.0112, over 16062.00 frames. ], tot_loss[loss=0.08615, simple_loss=0.1042, pruned_loss=0.02156, audio_tagging_loss=0.01249, over 2373648.96 frames. ], batch size: 62, lr: 5.89e-03, grad_scale: 32.0 2023-11-20 01:23:20,188 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.079e+01 8.204e+01 9.028e+01 9.850e+01 1.789e+02, threshold=1.806e+02, percent-clipped=1.0 2023-11-20 01:23:37,809 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=883853.3333333334, ans=0.5 2023-11-20 01:23:44,998 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=883853.3333333334, ans=0.125 2023-11-20 01:23:46,394 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=883853.3333333334, ans=0.0 2023-11-20 01:23:53,982 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.36 vs. limit=15.0 2023-11-20 01:24:00,917 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=883920.0, ans=0.09899494936611666 2023-11-20 01:24:03,121 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 132600 2023-11-20 01:24:16,344 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 350, loss[loss=0.08858, simple_loss=0.1186, pruned_loss=0.0202, audio_tagging_loss=0.009071, over 14781.00 frames. ], tot_loss[loss=0.08514, simple_loss=0.1042, pruned_loss=0.0213, audio_tagging_loss=0.01173, over 2521780.11 frames. ], batch size: 56, lr: 5.89e-03, grad_scale: 32.0 2023-11-20 01:24:46,025 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.45 vs. limit=15.0 2023-11-20 01:24:58,960 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=884253.3333333334, ans=0.1 2023-11-20 01:25:08,178 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 132650 2023-11-20 01:25:21,024 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 400, loss[loss=0.06207, simple_loss=0.0733, pruned_loss=0.01352, audio_tagging_loss=0.01191, over 15481.00 frames. ], tot_loss[loss=0.08442, simple_loss=0.1037, pruned_loss=0.02117, audio_tagging_loss=0.01138, over 2640163.26 frames. ], batch size: 58, lr: 5.89e-03, grad_scale: 32.0 2023-11-20 01:25:30,267 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.352e+01 8.151e+01 8.736e+01 9.522e+01 1.340e+02, threshold=1.747e+02, percent-clipped=0.0 2023-11-20 01:26:13,043 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 132700 2023-11-20 01:26:14,567 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=884653.3333333334, ans=0.0 2023-11-20 01:26:26,594 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 450, loss[loss=0.05865, simple_loss=0.06843, pruned_loss=0.01292, audio_tagging_loss=0.01152, over 14786.00 frames. ], tot_loss[loss=0.08318, simple_loss=0.1026, pruned_loss=0.02083, audio_tagging_loss=0.01104, over 2728587.80 frames. ], batch size: 57, lr: 5.89e-03, grad_scale: 32.0 2023-11-20 01:26:33,141 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=884720.0, ans=0.0 2023-11-20 01:26:39,617 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.20 vs. limit=15.0 2023-11-20 01:27:17,737 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=884986.6666666666, ans=0.025 2023-11-20 01:27:18,803 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 132750 2023-11-20 01:27:23,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=884986.6666666666, ans=0.125 2023-11-20 01:27:25,150 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=884986.6666666666, ans=0.0 2023-11-20 01:27:31,588 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 500, loss[loss=0.07677, simple_loss=0.1037, pruned_loss=0.01647, audio_tagging_loss=0.008457, over 15791.00 frames. ], tot_loss[loss=0.08225, simple_loss=0.1017, pruned_loss=0.02061, audio_tagging_loss=0.01079, over 2797961.73 frames. ], batch size: 59, lr: 5.88e-03, grad_scale: 32.0 2023-11-20 01:27:40,027 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.508e+01 8.200e+01 8.678e+01 9.366e+01 1.155e+02, threshold=1.736e+02, percent-clipped=0.0 2023-11-20 01:28:03,101 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.12 vs. limit=10.0 2023-11-20 01:28:10,929 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=885253.3333333334, ans=0.1 2023-11-20 01:28:23,644 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 132800 2023-11-20 01:28:30,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=885320.0, ans=0.125 2023-11-20 01:28:36,876 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 550, loss[loss=0.07991, simple_loss=0.1023, pruned_loss=0.01889, audio_tagging_loss=0.009858, over 14588.00 frames. ], tot_loss[loss=0.08206, simple_loss=0.1017, pruned_loss=0.02066, audio_tagging_loss=0.01056, over 2855459.75 frames. ], batch size: 56, lr: 5.88e-03, grad_scale: 32.0 2023-11-20 01:28:47,643 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=885386.6666666666, ans=0.0 2023-11-20 01:29:01,519 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=885520.0, ans=0.125 2023-11-20 01:29:28,484 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 132850 2023-11-20 01:29:40,387 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=885720.0, ans=0.125 2023-11-20 01:29:41,419 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 600, loss[loss=0.1062, simple_loss=0.1295, pruned_loss=0.03235, audio_tagging_loss=0.009067, over 15618.00 frames. ], tot_loss[loss=0.08202, simple_loss=0.1017, pruned_loss=0.02065, audio_tagging_loss=0.01053, over 2899039.97 frames. ], batch size: 56, lr: 5.88e-03, grad_scale: 32.0 2023-11-20 01:29:50,453 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.996e+01 8.266e+01 9.047e+01 9.748e+01 1.324e+02, threshold=1.809e+02, percent-clipped=0.0 2023-11-20 01:30:06,483 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=885853.3333333334, ans=15.0 2023-11-20 01:30:24,626 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=885920.0, ans=10.0 2023-11-20 01:30:32,863 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 132900 2023-11-20 01:30:33,140 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=885986.6666666666, ans=0.125 2023-11-20 01:30:44,535 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=886053.3333333334, ans=0.2 2023-11-20 01:30:45,521 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 650, loss[loss=0.08723, simple_loss=0.1132, pruned_loss=0.02095, audio_tagging_loss=0.009657, over 15729.00 frames. ], tot_loss[loss=0.08262, simple_loss=0.1023, pruned_loss=0.02093, audio_tagging_loss=0.01052, over 2932350.91 frames. ], batch size: 56, lr: 5.88e-03, grad_scale: 32.0 2023-11-20 01:30:51,475 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=886053.3333333334, ans=0.2 2023-11-20 01:30:56,393 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=886053.3333333334, ans=0.125 2023-11-20 01:31:15,326 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=886186.6666666666, ans=0.0 2023-11-20 01:31:22,690 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=886186.6666666666, ans=0.125 2023-11-20 01:31:33,239 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=886253.3333333334, ans=0.125 2023-11-20 01:31:34,411 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=886253.3333333334, ans=0.125 2023-11-20 01:31:38,573 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 132950 2023-11-20 01:31:43,819 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=886320.0, ans=0.1 2023-11-20 01:31:47,325 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=886320.0, ans=0.05 2023-11-20 01:31:51,408 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 700, loss[loss=0.09159, simple_loss=0.1122, pruned_loss=0.02471, audio_tagging_loss=0.01078, over 14865.00 frames. ], tot_loss[loss=0.08277, simple_loss=0.1026, pruned_loss=0.02093, audio_tagging_loss=0.01052, over 2962012.87 frames. ], batch size: 56, lr: 5.88e-03, grad_scale: 32.0 2023-11-20 01:32:00,265 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.620e+01 8.108e+01 8.721e+01 9.361e+01 1.160e+02, threshold=1.744e+02, percent-clipped=0.0 2023-11-20 01:32:02,978 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=886453.3333333334, ans=0.0 2023-11-20 01:32:24,270 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=886520.0, ans=0.0 2023-11-20 01:32:43,731 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 133000 2023-11-20 01:32:43,823 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=886653.3333333334, ans=0.0 2023-11-20 01:32:51,471 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=886653.3333333334, ans=0.0 2023-11-20 01:32:56,795 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 750, loss[loss=0.09655, simple_loss=0.1229, pruned_loss=0.02476, audio_tagging_loss=0.01036, over 14881.00 frames. ], tot_loss[loss=0.08308, simple_loss=0.1028, pruned_loss=0.02113, audio_tagging_loss=0.01055, over 2978454.01 frames. ], batch size: 56, lr: 5.88e-03, grad_scale: 32.0 2023-11-20 01:33:05,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=886720.0, ans=0.1 2023-11-20 01:33:09,522 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=886786.6666666666, ans=0.125 2023-11-20 01:33:12,528 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.96 vs. limit=15.0 2023-11-20 01:33:19,331 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=886786.6666666666, ans=0.0 2023-11-20 01:33:27,926 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=886853.3333333334, ans=0.1 2023-11-20 01:33:34,645 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=886920.0, ans=0.2 2023-11-20 01:33:35,991 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=886920.0, ans=0.0 2023-11-20 01:33:48,614 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 133050 2023-11-20 01:34:00,836 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 800, loss[loss=0.07481, simple_loss=0.09558, pruned_loss=0.01604, audio_tagging_loss=0.01098, over 15070.00 frames. ], tot_loss[loss=0.0833, simple_loss=0.103, pruned_loss=0.02128, audio_tagging_loss=0.01052, over 2996894.60 frames. ], batch size: 58, lr: 5.88e-03, grad_scale: 32.0 2023-11-20 01:34:07,983 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=887053.3333333334, ans=0.2 2023-11-20 01:34:10,154 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.686e+01 8.461e+01 9.039e+01 1.027e+02 1.682e+02, threshold=1.808e+02, percent-clipped=0.0 2023-11-20 01:34:10,499 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=887053.3333333334, ans=0.125 2023-11-20 01:34:10,629 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=887053.3333333334, ans=0.125 2023-11-20 01:34:28,134 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=887186.6666666666, ans=0.125 2023-11-20 01:34:52,120 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 133100 2023-11-20 01:34:52,774 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.96 vs. limit=6.0 2023-11-20 01:35:00,749 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=887320.0, ans=0.09899494936611666 2023-11-20 01:35:05,304 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 850, loss[loss=0.08469, simple_loss=0.09591, pruned_loss=0.02416, audio_tagging_loss=0.01257, over 14626.00 frames. ], tot_loss[loss=0.08355, simple_loss=0.1032, pruned_loss=0.02141, audio_tagging_loss=0.01052, over 3006790.05 frames. ], batch size: 57, lr: 5.88e-03, grad_scale: 32.0 2023-11-20 01:35:08,588 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=887386.6666666666, ans=0.125 2023-11-20 01:35:13,421 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=887386.6666666666, ans=0.125 2023-11-20 01:35:30,661 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.42 vs. limit=15.0 2023-11-20 01:35:34,158 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=887520.0, ans=0.04949747468305833 2023-11-20 01:35:35,674 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.30 vs. limit=15.0 2023-11-20 01:35:36,913 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.47 vs. limit=15.0 2023-11-20 01:35:43,259 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=887586.6666666666, ans=0.1 2023-11-20 01:35:46,254 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=887586.6666666666, ans=0.125 2023-11-20 01:35:57,688 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 133150 2023-11-20 01:35:57,834 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=887653.3333333334, ans=0.125 2023-11-20 01:35:57,991 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=887653.3333333334, ans=0.125 2023-11-20 01:36:01,425 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=887653.3333333334, ans=0.125 2023-11-20 01:36:10,439 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 900, loss[loss=0.0717, simple_loss=0.08882, pruned_loss=0.01678, audio_tagging_loss=0.0105, over 16398.00 frames. ], tot_loss[loss=0.08371, simple_loss=0.1035, pruned_loss=0.02143, audio_tagging_loss=0.01051, over 3016426.94 frames. ], batch size: 62, lr: 5.88e-03, grad_scale: 32.0 2023-11-20 01:36:18,048 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=887720.0, ans=0.1 2023-11-20 01:36:19,048 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.555e+01 8.224e+01 8.986e+01 9.963e+01 2.180e+02, threshold=1.797e+02, percent-clipped=1.0 2023-11-20 01:37:01,713 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 133200 2023-11-20 01:37:01,863 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=887986.6666666666, ans=0.0 2023-11-20 01:37:14,458 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 950, loss[loss=0.09848, simple_loss=0.1333, pruned_loss=0.02329, audio_tagging_loss=0.008559, over 16023.00 frames. ], tot_loss[loss=0.08332, simple_loss=0.1034, pruned_loss=0.02131, audio_tagging_loss=0.01033, over 3025907.36 frames. ], batch size: 59, lr: 5.87e-03, grad_scale: 32.0 2023-11-20 01:37:14,637 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=888053.3333333334, ans=0.0 2023-11-20 01:37:25,149 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=888053.3333333334, ans=0.125 2023-11-20 01:37:27,135 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=888120.0, ans=0.125 2023-11-20 01:37:37,988 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.12 vs. limit=6.0 2023-11-20 01:37:44,706 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 01:37:53,850 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=888253.3333333334, ans=0.125 2023-11-20 01:38:00,028 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=888253.3333333334, ans=0.09899494936611666 2023-11-20 01:38:02,308 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=888253.3333333334, ans=0.1 2023-11-20 01:38:05,969 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 133250 2023-11-20 01:38:19,478 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 1000, loss[loss=0.07397, simple_loss=0.09633, pruned_loss=0.01713, audio_tagging_loss=0.008668, over 15140.00 frames. ], tot_loss[loss=0.08263, simple_loss=0.1028, pruned_loss=0.02112, audio_tagging_loss=0.0101, over 3023996.55 frames. ], batch size: 59, lr: 5.87e-03, grad_scale: 32.0 2023-11-20 01:38:28,769 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.591e+01 8.070e+01 8.953e+01 9.480e+01 1.441e+02, threshold=1.791e+02, percent-clipped=0.0 2023-11-20 01:38:32,830 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=888453.3333333334, ans=0.0 2023-11-20 01:38:33,877 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=888453.3333333334, ans=0.125 2023-11-20 01:38:41,851 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=888453.3333333334, ans=0.0 2023-11-20 01:38:44,337 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=888520.0, ans=0.0 2023-11-20 01:38:46,610 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 01:38:49,265 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=888520.0, ans=0.2 2023-11-20 01:39:11,500 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 133300 2023-11-20 01:39:24,765 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 1050, loss[loss=0.07227, simple_loss=0.09538, pruned_loss=0.01521, audio_tagging_loss=0.009366, over 15282.00 frames. ], tot_loss[loss=0.08247, simple_loss=0.1027, pruned_loss=0.02111, audio_tagging_loss=0.01001, over 3028926.97 frames. ], batch size: 59, lr: 5.87e-03, grad_scale: 32.0 2023-11-20 01:39:43,295 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.08 vs. limit=10.0 2023-11-20 01:39:56,905 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=888853.3333333334, ans=0.04949747468305833 2023-11-20 01:40:01,050 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=888853.3333333334, ans=0.2 2023-11-20 01:40:16,284 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=888986.6666666666, ans=0.125 2023-11-20 01:40:17,219 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 133350 2023-11-20 01:40:29,522 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 1100, loss[loss=0.08372, simple_loss=0.1119, pruned_loss=0.02088, audio_tagging_loss=0.006911, over 14836.00 frames. ], tot_loss[loss=0.08294, simple_loss=0.1032, pruned_loss=0.02129, audio_tagging_loss=0.01002, over 3033083.57 frames. ], batch size: 54, lr: 5.87e-03, grad_scale: 32.0 2023-11-20 01:40:32,058 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 01:40:32,409 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=889053.3333333334, ans=0.125 2023-11-20 01:40:36,409 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.72 vs. limit=22.5 2023-11-20 01:40:38,580 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.984e+01 8.052e+01 8.709e+01 9.479e+01 1.259e+02, threshold=1.742e+02, percent-clipped=0.0 2023-11-20 01:40:41,406 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=889120.0, ans=0.0 2023-11-20 01:41:11,939 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.72 vs. limit=22.5 2023-11-20 01:41:13,893 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=889253.3333333334, ans=0.125 2023-11-20 01:41:16,379 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=889253.3333333334, ans=0.0 2023-11-20 01:41:16,456 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=889253.3333333334, ans=0.125 2023-11-20 01:41:21,147 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 133400 2023-11-20 01:41:21,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=889320.0, ans=0.125 2023-11-20 01:41:28,753 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=889320.0, ans=0.2 2023-11-20 01:41:31,132 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=889320.0, ans=0.0 2023-11-20 01:41:34,531 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 1150, loss[loss=0.07343, simple_loss=0.0865, pruned_loss=0.018, audio_tagging_loss=0.01218, over 15555.00 frames. ], tot_loss[loss=0.08364, simple_loss=0.104, pruned_loss=0.02162, audio_tagging_loss=0.01002, over 3041248.97 frames. ], batch size: 57, lr: 5.87e-03, grad_scale: 32.0 2023-11-20 01:41:38,576 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=889386.6666666666, ans=0.2 2023-11-20 01:41:45,591 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.90 vs. limit=15.0 2023-11-20 01:41:49,053 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=889453.3333333334, ans=0.2 2023-11-20 01:41:55,574 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=889453.3333333334, ans=0.125 2023-11-20 01:42:03,141 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=889520.0, ans=0.125 2023-11-20 01:42:05,735 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=889520.0, ans=0.0 2023-11-20 01:42:09,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=889520.0, ans=0.0 2023-11-20 01:42:09,838 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.49 vs. limit=6.0 2023-11-20 01:42:11,598 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=889586.6666666666, ans=0.125 2023-11-20 01:42:11,705 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=889586.6666666666, ans=0.125 2023-11-20 01:42:25,933 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 133450 2023-11-20 01:42:39,283 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 1200, loss[loss=0.08342, simple_loss=0.1083, pruned_loss=0.02069, audio_tagging_loss=0.008598, over 14492.00 frames. ], tot_loss[loss=0.08375, simple_loss=0.1046, pruned_loss=0.02152, audio_tagging_loss=0.009941, over 3036584.68 frames. ], batch size: 54, lr: 5.87e-03, grad_scale: 32.0 2023-11-20 01:42:40,546 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=889720.0, ans=0.125 2023-11-20 01:42:48,415 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.434e+01 8.251e+01 9.001e+01 9.736e+01 1.493e+02, threshold=1.800e+02, percent-clipped=0.0 2023-11-20 01:42:54,058 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.09 vs. limit=15.0 2023-11-20 01:42:56,440 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.24 vs. limit=22.5 2023-11-20 01:43:04,005 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.07 vs. limit=10.0 2023-11-20 01:43:18,491 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=889920.0, ans=0.0 2023-11-20 01:43:25,289 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=889920.0, ans=0.125 2023-11-20 01:43:31,757 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 133500 2023-11-20 01:43:34,734 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.60 vs. limit=15.0 2023-11-20 01:43:43,786 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 1250, loss[loss=0.08301, simple_loss=0.1071, pruned_loss=0.02053, audio_tagging_loss=0.008938, over 15871.00 frames. ], tot_loss[loss=0.08262, simple_loss=0.1031, pruned_loss=0.0211, audio_tagging_loss=0.009979, over 3039779.94 frames. ], batch size: 61, lr: 5.87e-03, grad_scale: 32.0 2023-11-20 01:44:16,602 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=890186.6666666666, ans=0.125 2023-11-20 01:44:35,376 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 133550 2023-11-20 01:44:38,073 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=890320.0, ans=0.125 2023-11-20 01:44:48,072 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 1300, loss[loss=0.08243, simple_loss=0.09925, pruned_loss=0.0227, audio_tagging_loss=0.01011, over 14920.00 frames. ], tot_loss[loss=0.08196, simple_loss=0.102, pruned_loss=0.02091, audio_tagging_loss=0.01004, over 3038140.28 frames. ], batch size: 57, lr: 5.87e-03, grad_scale: 64.0 2023-11-20 01:44:57,230 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.948e+01 8.375e+01 9.143e+01 9.896e+01 1.258e+02, threshold=1.829e+02, percent-clipped=0.0 2023-11-20 01:45:10,832 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=890453.3333333334, ans=0.1 2023-11-20 01:45:33,761 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.56 vs. limit=15.0 2023-11-20 01:45:39,232 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 133600 2023-11-20 01:45:45,170 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.82 vs. limit=12.0 2023-11-20 01:45:50,965 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.21 vs. limit=15.0 2023-11-20 01:45:52,902 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 1350, loss[loss=0.08849, simple_loss=0.1183, pruned_loss=0.01838, audio_tagging_loss=0.01096, over 15380.00 frames. ], tot_loss[loss=0.08164, simple_loss=0.1016, pruned_loss=0.02078, audio_tagging_loss=0.01007, over 3034368.39 frames. ], batch size: 58, lr: 5.87e-03, grad_scale: 32.0 2023-11-20 01:45:54,703 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.58 vs. limit=15.0 2023-11-20 01:45:56,239 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=890720.0, ans=0.125 2023-11-20 01:46:24,736 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=890853.3333333334, ans=0.125 2023-11-20 01:46:28,620 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=890853.3333333334, ans=0.0 2023-11-20 01:46:31,147 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 01:46:34,692 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=890920.0, ans=0.05 2023-11-20 01:46:39,103 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=890920.0, ans=0.05 2023-11-20 01:46:41,254 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 01:46:45,177 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 133650 2023-11-20 01:46:58,772 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 1400, loss[loss=0.07835, simple_loss=0.08928, pruned_loss=0.02204, audio_tagging_loss=0.01167, over 14356.00 frames. ], tot_loss[loss=0.08159, simple_loss=0.1012, pruned_loss=0.02077, audio_tagging_loss=0.01021, over 3030809.92 frames. ], batch size: 54, lr: 5.87e-03, grad_scale: 32.0 2023-11-20 01:47:02,830 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=891053.3333333334, ans=0.125 2023-11-20 01:47:08,577 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.444e+01 7.927e+01 8.547e+01 9.494e+01 1.207e+02, threshold=1.709e+02, percent-clipped=0.0 2023-11-20 01:47:09,038 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=891053.3333333334, ans=0.125 2023-11-20 01:47:11,404 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=891120.0, ans=0.125 2023-11-20 01:47:24,548 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=891186.6666666666, ans=10.0 2023-11-20 01:47:43,905 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=891253.3333333334, ans=0.125 2023-11-20 01:47:45,326 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.95 vs. limit=15.0 2023-11-20 01:47:49,911 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 133700 2023-11-20 01:48:02,921 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 1450, loss[loss=0.1136, simple_loss=0.1464, pruned_loss=0.03082, audio_tagging_loss=0.009551, over 15345.00 frames. ], tot_loss[loss=0.0825, simple_loss=0.1024, pruned_loss=0.02103, audio_tagging_loss=0.01025, over 3032031.08 frames. ], batch size: 55, lr: 5.86e-03, grad_scale: 16.0 2023-11-20 01:48:54,321 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 133750 2023-11-20 01:48:56,871 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=891653.3333333334, ans=0.0 2023-11-20 01:49:01,897 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=891653.3333333334, ans=0.09899494936611666 2023-11-20 01:49:04,804 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=891653.3333333334, ans=0.035 2023-11-20 01:49:04,940 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=891653.3333333334, ans=0.125 2023-11-20 01:49:07,039 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 1500, loss[loss=0.1127, simple_loss=0.1479, pruned_loss=0.02963, audio_tagging_loss=0.009127, over 15492.00 frames. ], tot_loss[loss=0.08233, simple_loss=0.1023, pruned_loss=0.02091, audio_tagging_loss=0.01028, over 3040714.41 frames. ], batch size: 59, lr: 5.86e-03, grad_scale: 16.0 2023-11-20 01:49:18,847 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 5.912e+01 7.823e+01 8.560e+01 9.381e+01 1.533e+02, threshold=1.712e+02, percent-clipped=0.0 2023-11-20 01:49:27,030 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=891786.6666666666, ans=0.125 2023-11-20 01:49:31,971 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=891853.3333333334, ans=0.0 2023-11-20 01:49:36,975 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=891853.3333333334, ans=0.1 2023-11-20 01:49:41,251 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=891853.3333333334, ans=0.0 2023-11-20 01:49:42,396 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=891853.3333333334, ans=0.1 2023-11-20 01:49:48,654 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=891920.0, ans=0.125 2023-11-20 01:49:59,153 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 133800 2023-11-20 01:50:01,516 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.03 vs. limit=22.5 2023-11-20 01:50:02,242 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=891986.6666666666, ans=0.125 2023-11-20 01:50:12,775 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 1550, loss[loss=0.07272, simple_loss=0.0819, pruned_loss=0.02032, audio_tagging_loss=0.01145, over 14246.00 frames. ], tot_loss[loss=0.08237, simple_loss=0.1021, pruned_loss=0.02102, audio_tagging_loss=0.01031, over 3038355.28 frames. ], batch size: 53, lr: 5.86e-03, grad_scale: 16.0 2023-11-20 01:50:22,998 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=892053.3333333334, ans=0.125 2023-11-20 01:50:40,933 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.39 vs. limit=22.5 2023-11-20 01:50:57,707 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=892253.3333333334, ans=0.0 2023-11-20 01:50:58,924 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=892253.3333333334, ans=0.1 2023-11-20 01:51:04,188 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 133850 2023-11-20 01:51:16,394 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 1600, loss[loss=0.08452, simple_loss=0.09544, pruned_loss=0.02484, audio_tagging_loss=0.01196, over 14580.00 frames. ], tot_loss[loss=0.0825, simple_loss=0.1022, pruned_loss=0.02103, audio_tagging_loss=0.01038, over 3040147.30 frames. ], batch size: 55, lr: 5.86e-03, grad_scale: 32.0 2023-11-20 01:51:18,566 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=892386.6666666666, ans=0.125 2023-11-20 01:51:18,666 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=892386.6666666666, ans=0.125 2023-11-20 01:51:28,055 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.791e+01 8.075e+01 8.775e+01 9.622e+01 1.213e+02, threshold=1.755e+02, percent-clipped=0.0 2023-11-20 01:51:34,020 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=892453.3333333334, ans=0.2 2023-11-20 01:51:44,343 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=892520.0, ans=0.1 2023-11-20 01:52:01,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=892586.6666666666, ans=0.0 2023-11-20 01:52:09,105 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 133900 2023-11-20 01:52:22,010 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 1650, loss[loss=0.09256, simple_loss=0.1197, pruned_loss=0.02024, audio_tagging_loss=0.01245, over 16614.00 frames. ], tot_loss[loss=0.08254, simple_loss=0.1021, pruned_loss=0.02103, audio_tagging_loss=0.01044, over 3041325.41 frames. ], batch size: 58, lr: 5.86e-03, grad_scale: 32.0 2023-11-20 01:52:36,886 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.24 vs. limit=15.0 2023-11-20 01:52:37,673 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=892786.6666666666, ans=0.125 2023-11-20 01:52:57,178 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=892853.3333333334, ans=0.125 2023-11-20 01:53:02,023 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=892920.0, ans=0.035 2023-11-20 01:53:13,614 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 133950 2023-11-20 01:53:18,643 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=892986.6666666666, ans=0.1 2023-11-20 01:53:18,671 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=892986.6666666666, ans=0.125 2023-11-20 01:53:26,557 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 1700, loss[loss=0.1004, simple_loss=0.1273, pruned_loss=0.02515, audio_tagging_loss=0.01158, over 16370.00 frames. ], tot_loss[loss=0.08288, simple_loss=0.1026, pruned_loss=0.02119, audio_tagging_loss=0.01039, over 3041985.39 frames. ], batch size: 59, lr: 5.86e-03, grad_scale: 32.0 2023-11-20 01:53:28,093 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=893053.3333333334, ans=0.125 2023-11-20 01:53:38,252 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.212e+01 8.118e+01 8.650e+01 9.250e+01 1.178e+02, threshold=1.730e+02, percent-clipped=0.0 2023-11-20 01:53:40,387 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.15 vs. limit=10.0 2023-11-20 01:53:42,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=893120.0, ans=0.04949747468305833 2023-11-20 01:54:01,956 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=893186.6666666666, ans=0.0 2023-11-20 01:54:05,723 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=893253.3333333334, ans=0.125 2023-11-20 01:54:18,421 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 134000 2023-11-20 01:54:31,557 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 1750, loss[loss=0.09984, simple_loss=0.1351, pruned_loss=0.02687, audio_tagging_loss=0.00543, over 15564.00 frames. ], tot_loss[loss=0.08254, simple_loss=0.1024, pruned_loss=0.02106, audio_tagging_loss=0.01029, over 3039543.79 frames. ], batch size: 56, lr: 5.86e-03, grad_scale: 32.0 2023-11-20 01:54:41,054 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=893386.6666666666, ans=0.125 2023-11-20 01:54:51,663 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=893453.3333333334, ans=0.1 2023-11-20 01:54:52,798 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=893453.3333333334, ans=0.0 2023-11-20 01:55:10,623 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=893586.6666666666, ans=0.2 2023-11-20 01:55:13,122 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=893586.6666666666, ans=0.125 2023-11-20 01:55:23,913 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 134050 2023-11-20 01:55:30,792 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.28 vs. limit=10.0 2023-11-20 01:55:32,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=893653.3333333334, ans=0.125 2023-11-20 01:55:36,235 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 1800, loss[loss=0.06797, simple_loss=0.08588, pruned_loss=0.01375, audio_tagging_loss=0.01128, over 16167.00 frames. ], tot_loss[loss=0.08189, simple_loss=0.1017, pruned_loss=0.02088, audio_tagging_loss=0.01016, over 3041865.56 frames. ], batch size: 60, lr: 5.86e-03, grad_scale: 32.0 2023-11-20 01:55:42,111 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=893720.0, ans=0.125 2023-11-20 01:55:47,811 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.798e+01 8.080e+01 8.622e+01 9.512e+01 1.674e+02, threshold=1.724e+02, percent-clipped=0.0 2023-11-20 01:55:53,620 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=893786.6666666666, ans=0.125 2023-11-20 01:55:54,911 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.814e-01 2023-11-20 01:56:13,249 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=893853.3333333334, ans=0.125 2023-11-20 01:56:24,499 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=893920.0, ans=0.0 2023-11-20 01:56:28,533 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 134100 2023-11-20 01:56:36,226 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 01:56:41,480 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 1850, loss[loss=0.1009, simple_loss=0.1287, pruned_loss=0.02746, audio_tagging_loss=0.009069, over 15329.00 frames. ], tot_loss[loss=0.08179, simple_loss=0.1016, pruned_loss=0.02086, audio_tagging_loss=0.01014, over 3037919.30 frames. ], batch size: 57, lr: 5.86e-03, grad_scale: 32.0 2023-11-20 01:57:11,425 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 01:57:21,903 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=894253.3333333334, ans=0.125 2023-11-20 01:57:23,213 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=894253.3333333334, ans=0.2 2023-11-20 01:57:32,321 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=894320.0, ans=0.125 2023-11-20 01:57:33,248 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 134150 2023-11-20 01:57:45,587 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 1900, loss[loss=0.0812, simple_loss=0.09907, pruned_loss=0.02025, audio_tagging_loss=0.01141, over 14687.00 frames. ], tot_loss[loss=0.082, simple_loss=0.1019, pruned_loss=0.02092, audio_tagging_loss=0.01011, over 3041915.19 frames. ], batch size: 57, lr: 5.85e-03, grad_scale: 32.0 2023-11-20 01:57:59,166 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.930e+01 8.176e+01 8.935e+01 9.422e+01 1.185e+02, threshold=1.787e+02, percent-clipped=0.0 2023-11-20 01:57:59,808 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.84 vs. limit=22.5 2023-11-20 01:58:00,835 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=894453.3333333334, ans=0.0 2023-11-20 01:58:17,763 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.15 vs. limit=22.5 2023-11-20 01:58:17,799 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.19 vs. limit=22.5 2023-11-20 01:58:38,353 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 134200 2023-11-20 01:58:51,023 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 1950, loss[loss=0.07903, simple_loss=0.09597, pruned_loss=0.02039, audio_tagging_loss=0.01066, over 14392.00 frames. ], tot_loss[loss=0.08075, simple_loss=0.1001, pruned_loss=0.02053, audio_tagging_loss=0.01017, over 3043366.85 frames. ], batch size: 56, lr: 5.85e-03, grad_scale: 16.0 2023-11-20 01:59:14,869 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=894786.6666666666, ans=0.2 2023-11-20 01:59:26,947 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=894853.3333333334, ans=0.125 2023-11-20 01:59:42,037 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 134250 2023-11-20 01:59:54,807 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 2000, loss[loss=0.06583, simple_loss=0.07849, pruned_loss=0.01298, audio_tagging_loss=0.01361, over 14671.00 frames. ], tot_loss[loss=0.08102, simple_loss=0.1007, pruned_loss=0.02055, audio_tagging_loss=0.0101, over 3033608.69 frames. ], batch size: 55, lr: 5.85e-03, grad_scale: 32.0 2023-11-20 02:00:02,207 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=895053.3333333334, ans=0.2 2023-11-20 02:00:08,113 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.403e+01 7.896e+01 8.593e+01 9.449e+01 1.399e+02, threshold=1.719e+02, percent-clipped=0.0 2023-11-20 02:00:46,700 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.69 vs. limit=22.5 2023-11-20 02:00:47,369 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 134300 2023-11-20 02:00:59,931 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 2050, loss[loss=0.08885, simple_loss=0.09988, pruned_loss=0.02437, audio_tagging_loss=0.01454, over 14720.00 frames. ], tot_loss[loss=0.0817, simple_loss=0.1019, pruned_loss=0.0207, audio_tagging_loss=0.01005, over 3040953.98 frames. ], batch size: 54, lr: 5.85e-03, grad_scale: 32.0 2023-11-20 02:01:17,468 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=895453.3333333334, ans=0.2 2023-11-20 02:01:18,877 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=895453.3333333334, ans=0.125 2023-11-20 02:01:51,624 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 134350 2023-11-20 02:02:02,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=895653.3333333334, ans=0.125 2023-11-20 02:02:05,139 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 2100, loss[loss=0.09432, simple_loss=0.1174, pruned_loss=0.02622, audio_tagging_loss=0.009404, over 15751.00 frames. ], tot_loss[loss=0.08151, simple_loss=0.1017, pruned_loss=0.02062, audio_tagging_loss=0.01003, over 3049531.98 frames. ], batch size: 58, lr: 5.85e-03, grad_scale: 16.0 2023-11-20 02:02:10,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=895720.0, ans=0.1 2023-11-20 02:02:10,381 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 02:02:10,384 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=895720.0, ans=0.125 2023-11-20 02:02:10,666 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.63 vs. limit=10.0 2023-11-20 02:02:14,855 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.24 vs. limit=10.0 2023-11-20 02:02:19,141 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.798e+01 8.312e+01 8.947e+01 9.682e+01 1.152e+02, threshold=1.789e+02, percent-clipped=0.0 2023-11-20 02:02:40,579 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=895853.3333333334, ans=0.1 2023-11-20 02:02:40,818 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=895853.3333333334, ans=0.04949747468305833 2023-11-20 02:02:56,470 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 134400 2023-11-20 02:02:56,638 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=895986.6666666666, ans=0.0 2023-11-20 02:03:09,588 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 2150, loss[loss=0.08533, simple_loss=0.1053, pruned_loss=0.02032, audio_tagging_loss=0.01237, over 15776.00 frames. ], tot_loss[loss=0.08199, simple_loss=0.1023, pruned_loss=0.02073, audio_tagging_loss=0.01011, over 3051606.93 frames. ], batch size: 58, lr: 5.85e-03, grad_scale: 16.0 2023-11-20 02:03:18,070 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=896053.3333333334, ans=0.0 2023-11-20 02:03:32,967 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=896120.0, ans=0.0 2023-11-20 02:03:33,160 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=896120.0, ans=10.0 2023-11-20 02:03:48,946 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 02:03:49,222 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=896253.3333333334, ans=0.0 2023-11-20 02:04:01,710 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 134450 2023-11-20 02:04:14,510 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 2200, loss[loss=0.06725, simple_loss=0.07704, pruned_loss=0.01692, audio_tagging_loss=0.01182, over 13938.00 frames. ], tot_loss[loss=0.08275, simple_loss=0.103, pruned_loss=0.02109, audio_tagging_loss=0.01018, over 3048974.33 frames. ], batch size: 55, lr: 5.85e-03, grad_scale: 16.0 2023-11-20 02:04:28,770 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.245e+01 8.199e+01 8.937e+01 9.521e+01 1.153e+02, threshold=1.787e+02, percent-clipped=0.0 2023-11-20 02:04:52,420 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=896586.6666666666, ans=0.125 2023-11-20 02:05:01,658 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=896586.6666666666, ans=0.2 2023-11-20 02:05:06,374 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 134500 2023-11-20 02:05:18,728 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.04 vs. limit=15.0 2023-11-20 02:05:19,206 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 2250, loss[loss=0.07707, simple_loss=0.09095, pruned_loss=0.0204, audio_tagging_loss=0.0112, over 16762.00 frames. ], tot_loss[loss=0.08334, simple_loss=0.1034, pruned_loss=0.0214, audio_tagging_loss=0.01023, over 3054105.62 frames. ], batch size: 62, lr: 5.85e-03, grad_scale: 16.0 2023-11-20 02:05:19,502 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=896720.0, ans=0.0 2023-11-20 02:05:30,516 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=896720.0, ans=0.0 2023-11-20 02:05:36,526 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=896786.6666666666, ans=0.1 2023-11-20 02:05:41,408 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=896786.6666666666, ans=0.07 2023-11-20 02:06:10,908 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 134550 2023-11-20 02:06:19,612 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 02:06:24,400 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 2300, loss[loss=0.09805, simple_loss=0.1195, pruned_loss=0.03078, audio_tagging_loss=0.007543, over 14441.00 frames. ], tot_loss[loss=0.08289, simple_loss=0.1029, pruned_loss=0.02114, audio_tagging_loss=0.0103, over 3045891.68 frames. ], batch size: 53, lr: 5.85e-03, grad_scale: 16.0 2023-11-20 02:06:26,406 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.57 vs. limit=12.0 2023-11-20 02:06:29,897 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.14 vs. limit=22.5 2023-11-20 02:06:38,518 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.838e+01 8.171e+01 8.997e+01 9.810e+01 1.855e+02, threshold=1.799e+02, percent-clipped=1.0 2023-11-20 02:06:59,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=897186.6666666666, ans=10.0 2023-11-20 02:07:01,320 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=897253.3333333334, ans=0.1 2023-11-20 02:07:15,453 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 134600 2023-11-20 02:07:20,757 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 02:07:23,920 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=897320.0, ans=0.125 2023-11-20 02:07:28,725 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 2350, loss[loss=0.08757, simple_loss=0.105, pruned_loss=0.0216, audio_tagging_loss=0.01345, over 15042.00 frames. ], tot_loss[loss=0.08247, simple_loss=0.1023, pruned_loss=0.02093, audio_tagging_loss=0.01039, over 3041557.10 frames. ], batch size: 56, lr: 5.84e-03, grad_scale: 16.0 2023-11-20 02:07:46,776 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=897453.3333333334, ans=0.125 2023-11-20 02:07:51,912 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=897453.3333333334, ans=0.0 2023-11-20 02:08:05,357 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=897520.0, ans=0.125 2023-11-20 02:08:07,029 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.49 vs. limit=12.0 2023-11-20 02:08:20,685 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 134650 2023-11-20 02:08:33,524 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 2400, loss[loss=0.07042, simple_loss=0.08158, pruned_loss=0.01878, audio_tagging_loss=0.01085, over 14379.00 frames. ], tot_loss[loss=0.08269, simple_loss=0.1026, pruned_loss=0.02093, audio_tagging_loss=0.01046, over 3040486.26 frames. ], batch size: 57, lr: 5.84e-03, grad_scale: 32.0 2023-11-20 02:08:42,444 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=897720.0, ans=0.1 2023-11-20 02:08:45,680 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=897786.6666666666, ans=0.0 2023-11-20 02:08:47,719 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.416e+01 8.072e+01 8.719e+01 9.774e+01 1.313e+02, threshold=1.744e+02, percent-clipped=0.0 2023-11-20 02:08:57,110 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 02:08:59,500 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=897853.3333333334, ans=0.0 2023-11-20 02:09:04,053 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.21 vs. limit=12.0 2023-11-20 02:09:14,618 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=897920.0, ans=0.0 2023-11-20 02:09:24,715 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 134700 2023-11-20 02:09:26,446 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.09 vs. limit=15.0 2023-11-20 02:09:33,952 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=897986.6666666666, ans=0.125 2023-11-20 02:09:37,393 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 2450, loss[loss=0.07506, simple_loss=0.08198, pruned_loss=0.02288, audio_tagging_loss=0.01119, over 13961.00 frames. ], tot_loss[loss=0.08306, simple_loss=0.1031, pruned_loss=0.02111, audio_tagging_loss=0.01041, over 3042752.00 frames. ], batch size: 53, lr: 5.84e-03, grad_scale: 16.0 2023-11-20 02:09:41,959 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=898053.3333333334, ans=0.95 2023-11-20 02:09:53,926 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 02:09:58,995 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=898120.0, ans=0.07 2023-11-20 02:10:08,571 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=898186.6666666666, ans=0.125 2023-11-20 02:10:16,863 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=898253.3333333334, ans=0.1 2023-11-20 02:10:24,649 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=898253.3333333334, ans=0.125 2023-11-20 02:10:26,158 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.65 vs. limit=22.5 2023-11-20 02:10:29,969 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 134750 2023-11-20 02:10:30,033 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=898320.0, ans=0.0 2023-11-20 02:10:34,230 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.40 vs. limit=15.0 2023-11-20 02:10:35,025 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=898320.0, ans=0.125 2023-11-20 02:10:36,338 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=898320.0, ans=0.0 2023-11-20 02:10:42,748 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 2500, loss[loss=0.08208, simple_loss=0.1055, pruned_loss=0.02047, audio_tagging_loss=0.008871, over 15426.00 frames. ], tot_loss[loss=0.08301, simple_loss=0.1032, pruned_loss=0.0211, audio_tagging_loss=0.01032, over 3053547.56 frames. ], batch size: 57, lr: 5.84e-03, grad_scale: 16.0 2023-11-20 02:10:44,353 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=898386.6666666666, ans=0.1 2023-11-20 02:10:57,311 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.817e+01 7.993e+01 8.796e+01 9.570e+01 1.207e+02, threshold=1.759e+02, percent-clipped=0.0 2023-11-20 02:11:06,761 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=898520.0, ans=0.125 2023-11-20 02:11:17,294 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=898520.0, ans=0.0 2023-11-20 02:11:20,783 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.78 vs. limit=22.5 2023-11-20 02:11:29,780 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.47 vs. limit=22.5 2023-11-20 02:11:32,220 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten.whitening_limit, batch_count=898586.6666666666, ans=15.0 2023-11-20 02:11:34,140 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 134800 2023-11-20 02:11:34,333 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=898653.3333333334, ans=0.0 2023-11-20 02:11:36,967 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=898653.3333333334, ans=0.125 2023-11-20 02:11:46,915 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 2550, loss[loss=0.1083, simple_loss=0.1321, pruned_loss=0.0351, audio_tagging_loss=0.007128, over 15400.00 frames. ], tot_loss[loss=0.08284, simple_loss=0.1029, pruned_loss=0.02107, audio_tagging_loss=0.0103, over 3046754.63 frames. ], batch size: 56, lr: 5.84e-03, grad_scale: 16.0 2023-11-20 02:12:18,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=898853.3333333334, ans=0.0 2023-11-20 02:12:39,682 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 134850 2023-11-20 02:12:52,579 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 2600, loss[loss=0.0722, simple_loss=0.09063, pruned_loss=0.01659, audio_tagging_loss=0.01029, over 15319.00 frames. ], tot_loss[loss=0.08277, simple_loss=0.103, pruned_loss=0.02101, audio_tagging_loss=0.01026, over 3052405.52 frames. ], batch size: 58, lr: 5.84e-03, grad_scale: 16.0 2023-11-20 02:12:59,317 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.74 vs. limit=15.0 2023-11-20 02:13:08,532 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.034e+01 8.150e+01 8.781e+01 9.502e+01 1.826e+02, threshold=1.756e+02, percent-clipped=1.0 2023-11-20 02:13:14,000 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=899120.0, ans=0.1 2023-11-20 02:13:28,164 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=899186.6666666666, ans=0.125 2023-11-20 02:13:32,935 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=899253.3333333334, ans=0.125 2023-11-20 02:13:39,945 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=899253.3333333334, ans=0.025 2023-11-20 02:13:42,800 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.65 vs. limit=15.0 2023-11-20 02:13:44,806 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 134900 2023-11-20 02:13:55,648 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=899320.0, ans=0.0 2023-11-20 02:13:57,410 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=899386.6666666666, ans=0.125 2023-11-20 02:13:58,385 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 2650, loss[loss=0.05391, simple_loss=0.06654, pruned_loss=0.008631, audio_tagging_loss=0.012, over 15336.00 frames. ], tot_loss[loss=0.08226, simple_loss=0.102, pruned_loss=0.02089, audio_tagging_loss=0.01039, over 3053349.76 frames. ], batch size: 61, lr: 5.84e-03, grad_scale: 16.0 2023-11-20 02:14:03,633 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=899386.6666666666, ans=0.2 2023-11-20 02:14:22,426 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 02:14:38,013 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=899586.6666666666, ans=0.0 2023-11-20 02:14:38,245 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=899586.6666666666, ans=0.1 2023-11-20 02:14:49,807 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 134950 2023-11-20 02:14:56,407 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=899653.3333333334, ans=0.125 2023-11-20 02:15:02,147 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 2700, loss[loss=0.06946, simple_loss=0.0726, pruned_loss=0.02086, audio_tagging_loss=0.0123, over 13952.00 frames. ], tot_loss[loss=0.08184, simple_loss=0.1016, pruned_loss=0.02079, audio_tagging_loss=0.01028, over 3044726.13 frames. ], batch size: 54, lr: 5.84e-03, grad_scale: 16.0 2023-11-20 02:15:02,465 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=899720.0, ans=0.1 2023-11-20 02:15:06,743 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=899720.0, ans=0.125 2023-11-20 02:15:14,762 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.22 vs. limit=15.0 2023-11-20 02:15:18,164 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.493e+01 8.480e+01 9.136e+01 9.839e+01 1.399e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-20 02:15:21,587 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.99 vs. limit=22.5 2023-11-20 02:15:25,852 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 02:15:35,708 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=899853.3333333334, ans=0.125 2023-11-20 02:15:36,606 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=899853.3333333334, ans=0.125 2023-11-20 02:15:43,115 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.96 vs. limit=6.0 2023-11-20 02:15:51,627 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.99 vs. limit=15.0 2023-11-20 02:15:54,286 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 135000 2023-11-20 02:16:02,397 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.20 vs. limit=22.5 2023-11-20 02:16:07,435 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 2750, loss[loss=0.08707, simple_loss=0.1032, pruned_loss=0.02429, audio_tagging_loss=0.0112, over 16362.00 frames. ], tot_loss[loss=0.08243, simple_loss=0.1023, pruned_loss=0.02113, audio_tagging_loss=0.01016, over 3054082.18 frames. ], batch size: 60, lr: 5.84e-03, grad_scale: 16.0 2023-11-20 02:16:10,264 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=900053.3333333334, ans=0.0 2023-11-20 02:16:32,431 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=900186.6666666666, ans=0.2 2023-11-20 02:16:40,901 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=900186.6666666666, ans=0.125 2023-11-20 02:16:49,708 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.60 vs. limit=22.5 2023-11-20 02:16:59,883 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 135050 2023-11-20 02:17:03,549 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 02:17:11,129 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=900386.6666666666, ans=0.125 2023-11-20 02:17:12,182 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 2800, loss[loss=0.05442, simple_loss=0.05893, pruned_loss=0.01437, audio_tagging_loss=0.01059, over 15242.00 frames. ], tot_loss[loss=0.08172, simple_loss=0.1014, pruned_loss=0.02089, audio_tagging_loss=0.01011, over 3053728.32 frames. ], batch size: 61, lr: 5.83e-03, grad_scale: 32.0 2023-11-20 02:17:19,651 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=900386.6666666666, ans=0.125 2023-11-20 02:17:24,448 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=900453.3333333334, ans=0.025 2023-11-20 02:17:28,019 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.764e+01 8.003e+01 8.693e+01 9.427e+01 1.214e+02, threshold=1.739e+02, percent-clipped=0.0 2023-11-20 02:17:33,928 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=900453.3333333334, ans=0.0 2023-11-20 02:17:40,507 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.53 vs. limit=15.0 2023-11-20 02:18:04,376 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 135100 2023-11-20 02:18:09,091 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=900653.3333333334, ans=0.09899494936611666 2023-11-20 02:18:17,509 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 2850, loss[loss=0.07381, simple_loss=0.09448, pruned_loss=0.01964, audio_tagging_loss=0.00693, over 15301.00 frames. ], tot_loss[loss=0.08075, simple_loss=0.1003, pruned_loss=0.02054, audio_tagging_loss=0.01007, over 3047626.26 frames. ], batch size: 58, lr: 5.83e-03, grad_scale: 32.0 2023-11-20 02:18:38,898 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.11 vs. limit=22.5 2023-11-20 02:19:05,691 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=900920.0, ans=0.125 2023-11-20 02:19:05,817 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=900920.0, ans=0.1 2023-11-20 02:19:09,329 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 135150 2023-11-20 02:19:21,175 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=901053.3333333334, ans=0.0 2023-11-20 02:19:22,096 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 2900, loss[loss=0.08214, simple_loss=0.1002, pruned_loss=0.02413, audio_tagging_loss=0.007926, over 14765.00 frames. ], tot_loss[loss=0.08113, simple_loss=0.1009, pruned_loss=0.02057, audio_tagging_loss=0.01013, over 3044395.45 frames. ], batch size: 58, lr: 5.83e-03, grad_scale: 32.0 2023-11-20 02:19:37,586 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.047e+01 8.169e+01 9.046e+01 9.875e+01 2.052e+02, threshold=1.809e+02, percent-clipped=1.0 2023-11-20 02:20:13,511 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 135200 2023-11-20 02:20:16,986 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=901320.0, ans=0.0 2023-11-20 02:20:26,785 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 2950, loss[loss=0.08937, simple_loss=0.1115, pruned_loss=0.02355, audio_tagging_loss=0.01009, over 16508.00 frames. ], tot_loss[loss=0.08178, simple_loss=0.1017, pruned_loss=0.02077, audio_tagging_loss=0.01017, over 3049109.15 frames. ], batch size: 62, lr: 5.83e-03, grad_scale: 32.0 2023-11-20 02:20:38,757 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=901453.3333333334, ans=0.125 2023-11-20 02:20:57,134 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=901520.0, ans=0.125 2023-11-20 02:20:59,620 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=901520.0, ans=0.1 2023-11-20 02:21:02,226 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=901520.0, ans=0.0 2023-11-20 02:21:12,863 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=901586.6666666666, ans=0.2 2023-11-20 02:21:14,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=901586.6666666666, ans=0.0 2023-11-20 02:21:14,632 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.88 vs. limit=6.0 2023-11-20 02:21:18,787 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 135250 2023-11-20 02:21:31,803 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 3000, loss[loss=0.08467, simple_loss=0.1076, pruned_loss=0.021, audio_tagging_loss=0.009889, over 14279.00 frames. ], tot_loss[loss=0.08218, simple_loss=0.1021, pruned_loss=0.02088, audio_tagging_loss=0.01024, over 3043257.31 frames. ], batch size: 54, lr: 5.83e-03, grad_scale: 16.0 2023-11-20 02:21:31,807 INFO [train_asr.py:1285] (0/4) Computing validation loss 2023-11-20 02:22:00,914 INFO [zipformer.py:1873] (0/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.8523, 4.9210, 4.9900, 4.8897], device='cuda:0') 2023-11-20 02:22:13,529 INFO [train_asr.py:1294] (0/4) Epoch 12, validation: loss=0.0631, simple_loss=0.05442, pruned_loss=0.006068, audio_tagging_loss=0.02982, over 4681554.00 frames. 2023-11-20 02:22:13,530 INFO [train_asr.py:1295] (0/4) Maximum memory allocated so far is 25925MB 2023-11-20 02:22:29,866 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.617e+01 8.337e+01 8.935e+01 9.823e+01 2.024e+02, threshold=1.787e+02, percent-clipped=1.0 2023-11-20 02:22:41,229 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=901853.3333333334, ans=0.2 2023-11-20 02:22:57,321 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=901920.0, ans=0.125 2023-11-20 02:23:05,058 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 135300 2023-11-20 02:23:09,240 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.67 vs. limit=15.0 2023-11-20 02:23:13,734 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=901986.6666666666, ans=0.125 2023-11-20 02:23:17,199 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 3050, loss[loss=0.08426, simple_loss=0.1042, pruned_loss=0.02146, audio_tagging_loss=0.01068, over 15536.00 frames. ], tot_loss[loss=0.08241, simple_loss=0.1023, pruned_loss=0.02109, audio_tagging_loss=0.01016, over 3041540.33 frames. ], batch size: 57, lr: 5.83e-03, grad_scale: 16.0 2023-11-20 02:23:19,385 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=902053.3333333334, ans=0.2 2023-11-20 02:23:56,509 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 02:24:09,908 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 135350 2023-11-20 02:24:21,172 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=902386.6666666666, ans=0.125 2023-11-20 02:24:22,080 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 3100, loss[loss=0.08622, simple_loss=0.1127, pruned_loss=0.01807, audio_tagging_loss=0.0118, over 14288.00 frames. ], tot_loss[loss=0.08214, simple_loss=0.1023, pruned_loss=0.02081, audio_tagging_loss=0.01019, over 3042757.23 frames. ], batch size: 52, lr: 5.83e-03, grad_scale: 16.0 2023-11-20 02:24:27,171 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 02:24:39,887 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.062e+01 8.132e+01 9.002e+01 1.001e+02 1.327e+02, threshold=1.800e+02, percent-clipped=0.0 2023-11-20 02:24:47,541 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=902520.0, ans=0.1 2023-11-20 02:25:02,016 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=902586.6666666666, ans=0.125 2023-11-20 02:25:14,133 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 135400 2023-11-20 02:25:24,642 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=902653.3333333334, ans=0.0 2023-11-20 02:25:27,953 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 3150, loss[loss=0.07042, simple_loss=0.08543, pruned_loss=0.01421, audio_tagging_loss=0.0135, over 15378.00 frames. ], tot_loss[loss=0.08245, simple_loss=0.1027, pruned_loss=0.0209, audio_tagging_loss=0.01022, over 3038773.87 frames. ], batch size: 59, lr: 5.83e-03, grad_scale: 16.0 2023-11-20 02:25:31,431 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=5.100e-03 2023-11-20 02:25:37,577 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=902720.0, ans=0.1 2023-11-20 02:25:46,409 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.55 vs. limit=15.0 2023-11-20 02:26:00,555 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=902853.3333333334, ans=0.05 2023-11-20 02:26:06,133 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.55 vs. limit=15.0 2023-11-20 02:26:20,994 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 135450 2023-11-20 02:26:28,646 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=902986.6666666666, ans=0.125 2023-11-20 02:26:33,308 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 3200, loss[loss=0.09425, simple_loss=0.1137, pruned_loss=0.02694, audio_tagging_loss=0.01047, over 14365.00 frames. ], tot_loss[loss=0.08216, simple_loss=0.1018, pruned_loss=0.02081, audio_tagging_loss=0.01045, over 3043612.91 frames. ], batch size: 57, lr: 5.83e-03, grad_scale: 32.0 2023-11-20 02:26:37,818 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=903053.3333333334, ans=0.0 2023-11-20 02:26:39,447 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.09 vs. limit=12.0 2023-11-20 02:26:42,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=903053.3333333334, ans=0.2 2023-11-20 02:26:43,985 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=903053.3333333334, ans=0.125 2023-11-20 02:26:49,019 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=903120.0, ans=0.125 2023-11-20 02:26:49,826 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.506e+01 8.472e+01 8.960e+01 9.869e+01 1.272e+02, threshold=1.792e+02, percent-clipped=0.0 2023-11-20 02:27:06,290 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=903186.6666666666, ans=0.0 2023-11-20 02:27:23,180 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=903253.3333333334, ans=0.09899494936611666 2023-11-20 02:27:25,472 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 135500 2023-11-20 02:27:25,635 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 02:27:31,216 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=903320.0, ans=0.0 2023-11-20 02:27:38,272 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 3250, loss[loss=0.09269, simple_loss=0.1165, pruned_loss=0.02108, audio_tagging_loss=0.01335, over 16062.00 frames. ], tot_loss[loss=0.08206, simple_loss=0.1016, pruned_loss=0.02076, audio_tagging_loss=0.01052, over 3044821.25 frames. ], batch size: 60, lr: 5.82e-03, grad_scale: 32.0 2023-11-20 02:27:39,847 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=903386.6666666666, ans=0.2 2023-11-20 02:27:55,372 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.88 vs. limit=10.0 2023-11-20 02:28:03,023 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=903453.3333333334, ans=0.0 2023-11-20 02:28:05,554 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 02:28:06,915 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=903520.0, ans=0.125 2023-11-20 02:28:08,127 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=903520.0, ans=0.2 2023-11-20 02:28:09,185 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=903520.0, ans=0.125 2023-11-20 02:28:11,689 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=903520.0, ans=0.125 2023-11-20 02:28:13,117 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=903520.0, ans=0.0 2023-11-20 02:28:21,078 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=903586.6666666666, ans=0.0 2023-11-20 02:28:30,486 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 135550 2023-11-20 02:28:33,440 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=903653.3333333334, ans=0.125 2023-11-20 02:28:41,360 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 02:28:43,510 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 3300, loss[loss=0.113, simple_loss=0.1464, pruned_loss=0.03261, audio_tagging_loss=0.007176, over 16398.00 frames. ], tot_loss[loss=0.08169, simple_loss=0.101, pruned_loss=0.02055, audio_tagging_loss=0.01064, over 3047224.30 frames. ], batch size: 61, lr: 5.82e-03, grad_scale: 32.0 2023-11-20 02:28:50,190 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=903720.0, ans=0.1 2023-11-20 02:29:00,612 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.558e+01 8.302e+01 8.686e+01 9.682e+01 1.210e+02, threshold=1.737e+02, percent-clipped=0.0 2023-11-20 02:29:06,439 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=903786.6666666666, ans=0.125 2023-11-20 02:29:08,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=903853.3333333334, ans=0.0 2023-11-20 02:29:35,813 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 135600 2023-11-20 02:29:40,518 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=903986.6666666666, ans=10.0 2023-11-20 02:29:44,156 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=903986.6666666666, ans=0.125 2023-11-20 02:29:48,787 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 3350, loss[loss=0.09052, simple_loss=0.1096, pruned_loss=0.0265, audio_tagging_loss=0.009229, over 15659.00 frames. ], tot_loss[loss=0.08281, simple_loss=0.1028, pruned_loss=0.02097, audio_tagging_loss=0.01044, over 3054185.19 frames. ], batch size: 61, lr: 5.82e-03, grad_scale: 32.0 2023-11-20 02:29:56,958 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=904053.3333333334, ans=0.2 2023-11-20 02:30:00,900 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.37 vs. limit=15.0 2023-11-20 02:30:04,684 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.02 vs. limit=6.0 2023-11-20 02:30:10,433 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.48 vs. limit=15.0 2023-11-20 02:30:20,643 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=904186.6666666666, ans=0.0 2023-11-20 02:30:23,277 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=904186.6666666666, ans=0.125 2023-11-20 02:30:39,810 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 135650 2023-11-20 02:30:52,691 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 3400, loss[loss=0.07789, simple_loss=0.1016, pruned_loss=0.01578, audio_tagging_loss=0.01132, over 15538.00 frames. ], tot_loss[loss=0.08298, simple_loss=0.1032, pruned_loss=0.02102, audio_tagging_loss=0.01036, over 3058883.87 frames. ], batch size: 57, lr: 5.82e-03, grad_scale: 32.0 2023-11-20 02:31:09,370 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.968e+01 8.145e+01 8.862e+01 9.550e+01 1.351e+02, threshold=1.772e+02, percent-clipped=0.0 2023-11-20 02:31:17,012 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=904453.3333333334, ans=0.0 2023-11-20 02:31:19,696 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=904520.0, ans=0.125 2023-11-20 02:31:27,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=904520.0, ans=0.125 2023-11-20 02:31:44,809 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 135700 2023-11-20 02:31:46,617 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.63 vs. limit=12.0 2023-11-20 02:31:47,440 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=904653.3333333334, ans=0.125 2023-11-20 02:31:51,131 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=904653.3333333334, ans=0.125 2023-11-20 02:31:57,659 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 3450, loss[loss=0.06882, simple_loss=0.08271, pruned_loss=0.01657, audio_tagging_loss=0.0109, over 15734.00 frames. ], tot_loss[loss=0.08239, simple_loss=0.1026, pruned_loss=0.02084, audio_tagging_loss=0.01023, over 3057199.01 frames. ], batch size: 60, lr: 5.82e-03, grad_scale: 32.0 2023-11-20 02:32:15,235 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=904786.6666666666, ans=10.0 2023-11-20 02:32:19,113 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=904786.6666666666, ans=0.125 2023-11-20 02:32:22,055 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.55 vs. limit=15.0 2023-11-20 02:32:25,096 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.92 vs. limit=15.0 2023-11-20 02:32:44,219 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.83 vs. limit=15.0 2023-11-20 02:32:49,592 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 135750 2023-11-20 02:33:03,110 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 3500, loss[loss=0.08422, simple_loss=0.1129, pruned_loss=0.01979, audio_tagging_loss=0.00796, over 16293.00 frames. ], tot_loss[loss=0.08204, simple_loss=0.1021, pruned_loss=0.02079, audio_tagging_loss=0.01019, over 3057314.18 frames. ], batch size: 62, lr: 5.82e-03, grad_scale: 32.0 2023-11-20 02:33:05,725 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=905053.3333333334, ans=0.05 2023-11-20 02:33:08,220 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=905053.3333333334, ans=0.125 2023-11-20 02:33:19,497 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.046e+01 8.042e+01 8.771e+01 9.579e+01 1.310e+02, threshold=1.754e+02, percent-clipped=0.0 2023-11-20 02:33:23,501 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=905120.0, ans=0.2 2023-11-20 02:33:36,425 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 02:33:55,116 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 135800 2023-11-20 02:34:02,282 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=905320.0, ans=0.125 2023-11-20 02:34:06,472 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.92 vs. limit=6.0 2023-11-20 02:34:08,280 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 3550, loss[loss=0.06832, simple_loss=0.0875, pruned_loss=0.01854, audio_tagging_loss=0.00603, over 15139.00 frames. ], tot_loss[loss=0.08173, simple_loss=0.1017, pruned_loss=0.02073, audio_tagging_loss=0.01015, over 3059832.68 frames. ], batch size: 58, lr: 5.82e-03, grad_scale: 32.0 2023-11-20 02:34:11,127 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=905386.6666666666, ans=0.125 2023-11-20 02:34:48,361 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=905586.6666666666, ans=0.025 2023-11-20 02:34:59,942 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 135850 2023-11-20 02:35:00,434 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.78 vs. limit=15.0 2023-11-20 02:35:12,746 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 3600, loss[loss=0.07191, simple_loss=0.09117, pruned_loss=0.01812, audio_tagging_loss=0.008197, over 14576.00 frames. ], tot_loss[loss=0.08091, simple_loss=0.1003, pruned_loss=0.02061, audio_tagging_loss=0.01013, over 3051546.12 frames. ], batch size: 54, lr: 5.82e-03, grad_scale: 32.0 2023-11-20 02:35:29,277 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.874e+01 8.123e+01 9.173e+01 1.010e+02 1.525e+02, threshold=1.835e+02, percent-clipped=0.0 2023-11-20 02:35:36,997 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=13.49 vs. limit=15.0 2023-11-20 02:35:44,511 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=905853.3333333334, ans=0.125 2023-11-20 02:36:04,323 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 135900 2023-11-20 02:36:11,445 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=905986.6666666666, ans=0.2 2023-11-20 02:36:17,249 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 3650, loss[loss=0.1038, simple_loss=0.1403, pruned_loss=0.02549, audio_tagging_loss=0.008172, over 15946.00 frames. ], tot_loss[loss=0.08185, simple_loss=0.1016, pruned_loss=0.02098, audio_tagging_loss=0.01009, over 3051208.51 frames. ], batch size: 59, lr: 5.82e-03, grad_scale: 32.0 2023-11-20 02:36:24,715 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.66 vs. limit=6.0 2023-11-20 02:36:27,316 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.79 vs. limit=22.5 2023-11-20 02:36:43,494 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=906186.6666666666, ans=0.125 2023-11-20 02:36:51,622 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=906186.6666666666, ans=0.125 2023-11-20 02:36:52,008 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.61 vs. limit=15.0 2023-11-20 02:37:03,365 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=906253.3333333334, ans=0.025 2023-11-20 02:37:09,379 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 135950 2023-11-20 02:37:22,752 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 3700, loss[loss=0.08555, simple_loss=0.1041, pruned_loss=0.0224, audio_tagging_loss=0.01109, over 15603.00 frames. ], tot_loss[loss=0.08206, simple_loss=0.102, pruned_loss=0.02102, audio_tagging_loss=0.01004, over 3050224.62 frames. ], batch size: 57, lr: 5.82e-03, grad_scale: 32.0 2023-11-20 02:37:26,780 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=906386.6666666666, ans=0.0 2023-11-20 02:37:29,176 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=4.899e-01 2023-11-20 02:37:30,361 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=906386.6666666666, ans=0.125 2023-11-20 02:37:36,377 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=906453.3333333334, ans=0.125 2023-11-20 02:37:38,492 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.777e+01 8.262e+01 8.837e+01 9.420e+01 1.280e+02, threshold=1.767e+02, percent-clipped=0.0 2023-11-20 02:37:53,183 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=906520.0, ans=0.125 2023-11-20 02:38:04,815 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=906586.6666666666, ans=0.0 2023-11-20 02:38:10,367 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=906586.6666666666, ans=0.125 2023-11-20 02:38:13,895 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 136000 2023-11-20 02:38:15,421 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-136000.pt 2023-11-20 02:38:29,433 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 3750, loss[loss=0.111, simple_loss=0.1329, pruned_loss=0.03731, audio_tagging_loss=0.007213, over 15080.00 frames. ], tot_loss[loss=0.08273, simple_loss=0.1029, pruned_loss=0.02128, audio_tagging_loss=0.01001, over 3046578.85 frames. ], batch size: 55, lr: 5.81e-03, grad_scale: 16.0 2023-11-20 02:38:31,388 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.17 vs. limit=22.5 2023-11-20 02:38:50,944 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=906786.6666666666, ans=0.5 2023-11-20 02:38:52,036 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 02:39:09,206 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=906920.0, ans=0.1 2023-11-20 02:39:15,324 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=906920.0, ans=0.125 2023-11-20 02:39:16,280 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 02:39:21,896 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 136050 2023-11-20 02:39:34,595 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 3800, loss[loss=0.08217, simple_loss=0.1031, pruned_loss=0.02148, audio_tagging_loss=0.009131, over 15882.00 frames. ], tot_loss[loss=0.08301, simple_loss=0.1031, pruned_loss=0.02142, audio_tagging_loss=0.01006, over 3047014.49 frames. ], batch size: 59, lr: 5.81e-03, grad_scale: 16.0 2023-11-20 02:39:41,226 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=907053.3333333334, ans=0.125 2023-11-20 02:39:42,316 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=907053.3333333334, ans=0.1 2023-11-20 02:39:49,688 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff3.min_abs, batch_count=907120.0, ans=0.2 2023-11-20 02:39:52,985 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.871e+01 8.426e+01 8.980e+01 9.690e+01 1.284e+02, threshold=1.796e+02, percent-clipped=0.0 2023-11-20 02:39:59,295 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=907186.6666666666, ans=0.0 2023-11-20 02:40:01,563 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=907186.6666666666, ans=0.1 2023-11-20 02:40:13,734 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=907253.3333333334, ans=0.125 2023-11-20 02:40:26,347 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 136100 2023-11-20 02:40:34,999 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=907320.0, ans=0.125 2023-11-20 02:40:38,874 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=907386.6666666666, ans=0.2 2023-11-20 02:40:39,674 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 3850, loss[loss=0.09999, simple_loss=0.1388, pruned_loss=0.02488, audio_tagging_loss=0.005698, over 15032.00 frames. ], tot_loss[loss=0.08399, simple_loss=0.1046, pruned_loss=0.02166, audio_tagging_loss=0.01003, over 3048524.18 frames. ], batch size: 55, lr: 5.81e-03, grad_scale: 16.0 2023-11-20 02:41:06,254 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=907520.0, ans=0.0 2023-11-20 02:41:17,955 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=907586.6666666666, ans=0.125 2023-11-20 02:41:21,695 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=907586.6666666666, ans=0.125 2023-11-20 02:41:31,472 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 136150 2023-11-20 02:41:36,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=907653.3333333334, ans=0.125 2023-11-20 02:41:37,792 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=907653.3333333334, ans=0.125 2023-11-20 02:41:43,667 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 3900, loss[loss=0.08731, simple_loss=0.09783, pruned_loss=0.02374, audio_tagging_loss=0.01466, over 15631.00 frames. ], tot_loss[loss=0.08419, simple_loss=0.1045, pruned_loss=0.02186, audio_tagging_loss=0.0101, over 3044427.22 frames. ], batch size: 60, lr: 5.81e-03, grad_scale: 16.0 2023-11-20 02:42:02,316 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.823e+01 8.198e+01 8.910e+01 9.669e+01 1.262e+02, threshold=1.782e+02, percent-clipped=0.0 2023-11-20 02:42:05,567 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.13 vs. limit=15.0 2023-11-20 02:42:07,805 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=907786.6666666666, ans=0.125 2023-11-20 02:42:08,885 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=907853.3333333334, ans=0.2 2023-11-20 02:42:11,619 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=907853.3333333334, ans=0.09899494936611666 2023-11-20 02:42:15,327 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=907853.3333333334, ans=0.125 2023-11-20 02:42:35,721 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 136200 2023-11-20 02:42:36,203 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.53 vs. limit=22.5 2023-11-20 02:42:45,297 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=907986.6666666666, ans=0.0 2023-11-20 02:42:49,267 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 3950, loss[loss=0.099, simple_loss=0.124, pruned_loss=0.02607, audio_tagging_loss=0.01095, over 15523.00 frames. ], tot_loss[loss=0.0835, simple_loss=0.1036, pruned_loss=0.02154, audio_tagging_loss=0.01016, over 3042323.73 frames. ], batch size: 57, lr: 5.81e-03, grad_scale: 16.0 2023-11-20 02:43:23,925 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 02:43:26,170 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=908253.3333333334, ans=0.1 2023-11-20 02:43:40,669 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 136250 2023-11-20 02:43:43,837 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.54 vs. limit=12.0 2023-11-20 02:43:52,740 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 4000, loss[loss=0.1001, simple_loss=0.1314, pruned_loss=0.02606, audio_tagging_loss=0.008365, over 16013.00 frames. ], tot_loss[loss=0.08374, simple_loss=0.1036, pruned_loss=0.02164, audio_tagging_loss=0.01031, over 3052718.55 frames. ], batch size: 59, lr: 5.81e-03, grad_scale: 32.0 2023-11-20 02:43:59,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=908386.6666666666, ans=0.125 2023-11-20 02:44:01,625 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=908386.6666666666, ans=0.025 2023-11-20 02:44:02,912 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=908386.6666666666, ans=0.0 2023-11-20 02:44:11,156 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.441e+01 8.168e+01 8.877e+01 9.706e+01 2.567e+02, threshold=1.775e+02, percent-clipped=1.0 2023-11-20 02:44:17,005 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=908453.3333333334, ans=0.0 2023-11-20 02:44:19,313 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=908520.0, ans=0.95 2023-11-20 02:44:19,548 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=908520.0, ans=0.0 2023-11-20 02:44:22,024 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=908520.0, ans=0.04949747468305833 2023-11-20 02:44:40,762 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=908586.6666666666, ans=0.125 2023-11-20 02:44:43,681 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 02:44:44,747 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 136300 2023-11-20 02:44:47,283 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=908653.3333333334, ans=0.0 2023-11-20 02:44:52,962 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=908653.3333333334, ans=0.125 2023-11-20 02:44:54,251 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=908653.3333333334, ans=0.125 2023-11-20 02:44:57,475 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 4050, loss[loss=0.09949, simple_loss=0.1161, pruned_loss=0.02891, audio_tagging_loss=0.01254, over 14977.00 frames. ], tot_loss[loss=0.08466, simple_loss=0.1048, pruned_loss=0.02191, audio_tagging_loss=0.01034, over 3053823.62 frames. ], batch size: 54, lr: 5.81e-03, grad_scale: 32.0 2023-11-20 02:45:01,206 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 02:45:20,388 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.45 vs. limit=15.0 2023-11-20 02:45:25,015 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=908853.3333333334, ans=0.125 2023-11-20 02:45:49,221 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 136350 2023-11-20 02:46:01,977 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 4100, loss[loss=0.07366, simple_loss=0.08422, pruned_loss=0.01974, audio_tagging_loss=0.01181, over 16215.00 frames. ], tot_loss[loss=0.08451, simple_loss=0.1046, pruned_loss=0.02185, audio_tagging_loss=0.01037, over 3053593.78 frames. ], batch size: 64, lr: 5.81e-03, grad_scale: 32.0 2023-11-20 02:46:19,911 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.593e+01 8.318e+01 8.893e+01 9.801e+01 1.256e+02, threshold=1.779e+02, percent-clipped=0.0 2023-11-20 02:46:35,453 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.27 vs. limit=12.0 2023-11-20 02:46:50,457 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=909253.3333333334, ans=0.05 2023-11-20 02:46:54,593 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 136400 2023-11-20 02:47:07,347 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 4150, loss[loss=0.09103, simple_loss=0.1157, pruned_loss=0.02607, audio_tagging_loss=0.007099, over 17501.00 frames. ], tot_loss[loss=0.08422, simple_loss=0.1045, pruned_loss=0.02179, audio_tagging_loss=0.01018, over 3047770.86 frames. ], batch size: 67, lr: 5.81e-03, grad_scale: 32.0 2023-11-20 02:47:11,863 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=909386.6666666666, ans=0.0 2023-11-20 02:47:41,321 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=909520.0, ans=0.125 2023-11-20 02:47:50,518 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=909586.6666666666, ans=0.1 2023-11-20 02:47:55,242 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 02:47:57,301 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.96 vs. limit=6.0 2023-11-20 02:47:59,056 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 136450 2023-11-20 02:48:02,464 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=909653.3333333334, ans=0.125 2023-11-20 02:48:09,753 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=909653.3333333334, ans=0.0 2023-11-20 02:48:12,032 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 4200, loss[loss=0.08166, simple_loss=0.0997, pruned_loss=0.02338, audio_tagging_loss=0.00843, over 15300.00 frames. ], tot_loss[loss=0.08416, simple_loss=0.1045, pruned_loss=0.02185, audio_tagging_loss=0.01006, over 3045210.85 frames. ], batch size: 58, lr: 5.80e-03, grad_scale: 32.0 2023-11-20 02:48:23,371 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=909720.0, ans=0.125 2023-11-20 02:48:30,384 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.105e+01 8.354e+01 9.339e+01 1.014e+02 1.353e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-20 02:48:35,294 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.99 vs. limit=15.0 2023-11-20 02:49:04,017 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 136500 2023-11-20 02:49:16,881 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 4250, loss[loss=0.1114, simple_loss=0.1317, pruned_loss=0.03591, audio_tagging_loss=0.009659, over 15505.00 frames. ], tot_loss[loss=0.08341, simple_loss=0.1037, pruned_loss=0.02163, audio_tagging_loss=0.009931, over 3049370.40 frames. ], batch size: 56, lr: 5.80e-03, grad_scale: 32.0 2023-11-20 02:49:26,500 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=910053.3333333334, ans=0.0 2023-11-20 02:49:28,948 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=910120.0, ans=0.2 2023-11-20 02:49:31,250 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=910120.0, ans=0.0 2023-11-20 02:50:03,575 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=910253.3333333334, ans=0.125 2023-11-20 02:50:08,494 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 136550 2023-11-20 02:50:21,294 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 4300, loss[loss=0.06361, simple_loss=0.0827, pruned_loss=0.01153, audio_tagging_loss=0.01073, over 15762.00 frames. ], tot_loss[loss=0.08376, simple_loss=0.1044, pruned_loss=0.02168, audio_tagging_loss=0.009876, over 3053971.50 frames. ], batch size: 57, lr: 5.80e-03, grad_scale: 16.0 2023-11-20 02:50:21,685 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=910386.6666666666, ans=0.0 2023-11-20 02:50:26,293 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=910386.6666666666, ans=0.0 2023-11-20 02:50:32,008 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=910386.6666666666, ans=0.2 2023-11-20 02:50:39,473 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=910453.3333333334, ans=0.1 2023-11-20 02:50:40,217 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.215e+01 8.438e+01 9.081e+01 9.991e+01 1.404e+02, threshold=1.816e+02, percent-clipped=0.0 2023-11-20 02:50:53,212 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.39 vs. limit=15.0 2023-11-20 02:51:02,942 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=910586.6666666666, ans=0.1 2023-11-20 02:51:03,164 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=910586.6666666666, ans=0.125 2023-11-20 02:51:09,583 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.69 vs. limit=15.0 2023-11-20 02:51:12,553 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 136600 2023-11-20 02:51:18,782 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=910653.3333333334, ans=0.1 2023-11-20 02:51:25,585 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 4350, loss[loss=0.08456, simple_loss=0.1032, pruned_loss=0.02248, audio_tagging_loss=0.0105, over 16232.00 frames. ], tot_loss[loss=0.0834, simple_loss=0.104, pruned_loss=0.02146, audio_tagging_loss=0.009931, over 3061088.71 frames. ], batch size: 64, lr: 5.80e-03, grad_scale: 16.0 2023-11-20 02:51:53,751 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.10 vs. limit=22.5 2023-11-20 02:52:09,930 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=910920.0, ans=0.125 2023-11-20 02:52:13,480 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=910920.0, ans=0.125 2023-11-20 02:52:17,050 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 136650 2023-11-20 02:52:17,314 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=910986.6666666666, ans=0.1 2023-11-20 02:52:30,330 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 4400, loss[loss=0.08354, simple_loss=0.09999, pruned_loss=0.02208, audio_tagging_loss=0.01146, over 14374.00 frames. ], tot_loss[loss=0.08385, simple_loss=0.1046, pruned_loss=0.02156, audio_tagging_loss=0.009972, over 3059299.93 frames. ], batch size: 54, lr: 5.80e-03, grad_scale: 32.0 2023-11-20 02:52:49,555 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.680e+01 7.985e+01 8.679e+01 9.460e+01 1.350e+02, threshold=1.736e+02, percent-clipped=0.0 2023-11-20 02:52:56,909 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.46 vs. limit=15.0 2023-11-20 02:52:59,100 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=911186.6666666666, ans=0.2 2023-11-20 02:53:21,552 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 136700 2023-11-20 02:53:27,190 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=911320.0, ans=0.2 2023-11-20 02:53:28,583 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=911320.0, ans=0.2 2023-11-20 02:53:34,322 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 4450, loss[loss=0.08105, simple_loss=0.1063, pruned_loss=0.01796, audio_tagging_loss=0.009956, over 16442.00 frames. ], tot_loss[loss=0.08357, simple_loss=0.1042, pruned_loss=0.02145, audio_tagging_loss=0.01003, over 3062553.82 frames. ], batch size: 61, lr: 5.80e-03, grad_scale: 32.0 2023-11-20 02:53:48,780 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=911453.3333333334, ans=0.0 2023-11-20 02:53:54,294 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=14.21 vs. limit=15.0 2023-11-20 02:54:10,587 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=911520.0, ans=0.125 2023-11-20 02:54:14,874 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=911586.6666666666, ans=0.125 2023-11-20 02:54:24,568 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=911653.3333333334, ans=0.0 2023-11-20 02:54:25,585 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 136750 2023-11-20 02:54:33,525 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=911653.3333333334, ans=0.125 2023-11-20 02:54:34,107 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.33 vs. limit=22.5 2023-11-20 02:54:38,264 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 4500, loss[loss=0.08414, simple_loss=0.1067, pruned_loss=0.01987, audio_tagging_loss=0.01091, over 14707.00 frames. ], tot_loss[loss=0.08319, simple_loss=0.1039, pruned_loss=0.02121, audio_tagging_loss=0.01, over 3062351.73 frames. ], batch size: 53, lr: 5.80e-03, grad_scale: 32.0 2023-11-20 02:54:49,530 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=911786.6666666666, ans=0.2 2023-11-20 02:54:55,239 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=911786.6666666666, ans=0.2 2023-11-20 02:54:57,901 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.729e+01 8.079e+01 8.708e+01 9.513e+01 1.325e+02, threshold=1.742e+02, percent-clipped=0.0 2023-11-20 02:55:29,837 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 136800 2023-11-20 02:55:30,071 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=911986.6666666666, ans=0.09899494936611666 2023-11-20 02:55:43,000 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 4550, loss[loss=0.05609, simple_loss=0.06971, pruned_loss=0.01177, audio_tagging_loss=0.009472, over 15394.00 frames. ], tot_loss[loss=0.08269, simple_loss=0.1031, pruned_loss=0.02109, audio_tagging_loss=0.01005, over 3051019.32 frames. ], batch size: 58, lr: 5.80e-03, grad_scale: 32.0 2023-11-20 02:55:43,245 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=912053.3333333334, ans=0.125 2023-11-20 02:55:58,375 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.55 vs. limit=15.0 2023-11-20 02:56:09,283 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=912186.6666666666, ans=0.125 2023-11-20 02:56:32,671 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 02:56:35,234 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 136850 2023-11-20 02:56:35,362 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=912320.0, ans=0.0 2023-11-20 02:56:36,612 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=912320.0, ans=0.0 2023-11-20 02:56:42,441 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.81 vs. limit=15.0 2023-11-20 02:56:45,063 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=912320.0, ans=0.125 2023-11-20 02:56:48,559 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 4600, loss[loss=0.08659, simple_loss=0.1179, pruned_loss=0.02057, audio_tagging_loss=0.007045, over 15309.00 frames. ], tot_loss[loss=0.08258, simple_loss=0.1026, pruned_loss=0.02114, audio_tagging_loss=0.01012, over 3057043.04 frames. ], batch size: 57, lr: 5.80e-03, grad_scale: 32.0 2023-11-20 02:57:00,485 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=912453.3333333334, ans=0.125 2023-11-20 02:57:00,618 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=912453.3333333334, ans=0.0 2023-11-20 02:57:07,910 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.584e+01 7.967e+01 8.569e+01 9.519e+01 1.814e+02, threshold=1.714e+02, percent-clipped=1.0 2023-11-20 02:57:08,267 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=912453.3333333334, ans=0.1 2023-11-20 02:57:17,497 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.24 vs. limit=15.0 2023-11-20 02:57:20,116 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.14 vs. limit=15.0 2023-11-20 02:57:41,172 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 136900 2023-11-20 02:57:42,472 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=912653.3333333334, ans=0.125 2023-11-20 02:57:44,220 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.67 vs. limit=12.0 2023-11-20 02:57:47,571 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.12 vs. limit=22.5 2023-11-20 02:57:53,899 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 4650, loss[loss=0.1123, simple_loss=0.1462, pruned_loss=0.02687, audio_tagging_loss=0.01235, over 14859.00 frames. ], tot_loss[loss=0.0827, simple_loss=0.1026, pruned_loss=0.02124, audio_tagging_loss=0.01016, over 3046322.74 frames. ], batch size: 54, lr: 5.80e-03, grad_scale: 32.0 2023-11-20 02:57:55,768 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.63 vs. limit=15.0 2023-11-20 02:58:08,030 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=912786.6666666666, ans=0.2 2023-11-20 02:58:14,742 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=912786.6666666666, ans=0.125 2023-11-20 02:58:18,785 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=912853.3333333334, ans=0.125 2023-11-20 02:58:31,312 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.54 vs. limit=15.0 2023-11-20 02:58:44,999 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=912986.6666666666, ans=0.95 2023-11-20 02:58:46,020 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 136950 2023-11-20 02:58:47,915 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.80 vs. limit=6.0 2023-11-20 02:58:51,283 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.83 vs. limit=22.5 2023-11-20 02:58:54,790 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=912986.6666666666, ans=0.0 2023-11-20 02:58:58,312 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 4700, loss[loss=0.1121, simple_loss=0.1292, pruned_loss=0.03699, audio_tagging_loss=0.01053, over 15626.00 frames. ], tot_loss[loss=0.08275, simple_loss=0.1026, pruned_loss=0.02126, audio_tagging_loss=0.01017, over 3045235.80 frames. ], batch size: 56, lr: 5.79e-03, grad_scale: 32.0 2023-11-20 02:59:08,116 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=913053.3333333334, ans=0.125 2023-11-20 02:59:17,788 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.469e+01 8.087e+01 8.546e+01 9.453e+01 1.405e+02, threshold=1.709e+02, percent-clipped=0.0 2023-11-20 02:59:19,418 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=913120.0, ans=0.1 2023-11-20 02:59:29,735 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=913186.6666666666, ans=0.0 2023-11-20 02:59:49,513 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 137000 2023-11-20 03:00:03,335 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 4750, loss[loss=0.09747, simple_loss=0.1173, pruned_loss=0.02736, audio_tagging_loss=0.01145, over 16075.00 frames. ], tot_loss[loss=0.08201, simple_loss=0.1016, pruned_loss=0.02089, audio_tagging_loss=0.01032, over 3042240.48 frames. ], batch size: 60, lr: 5.79e-03, grad_scale: 32.0 2023-11-20 03:00:09,736 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 03:00:34,635 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=913520.0, ans=0.125 2023-11-20 03:00:54,164 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.85 vs. limit=15.0 2023-11-20 03:00:54,755 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 137050 2023-11-20 03:01:07,318 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 4800, loss[loss=0.09776, simple_loss=0.122, pruned_loss=0.02348, audio_tagging_loss=0.01328, over 15669.00 frames. ], tot_loss[loss=0.08231, simple_loss=0.1019, pruned_loss=0.02103, audio_tagging_loss=0.01031, over 3042366.09 frames. ], batch size: 57, lr: 5.79e-03, grad_scale: 32.0 2023-11-20 03:01:11,300 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 03:01:23,430 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=913786.6666666666, ans=0.125 2023-11-20 03:01:25,565 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.113e+01 8.051e+01 8.703e+01 9.476e+01 1.263e+02, threshold=1.741e+02, percent-clipped=0.0 2023-11-20 03:01:53,366 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=913920.0, ans=0.125 2023-11-20 03:01:58,937 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 137100 2023-11-20 03:02:00,424 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=913986.6666666666, ans=0.125 2023-11-20 03:02:11,287 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 4850, loss[loss=0.09306, simple_loss=0.1268, pruned_loss=0.02097, audio_tagging_loss=0.008677, over 14611.00 frames. ], tot_loss[loss=0.08229, simple_loss=0.102, pruned_loss=0.02082, audio_tagging_loss=0.01045, over 3043087.16 frames. ], batch size: 53, lr: 5.79e-03, grad_scale: 32.0 2023-11-20 03:02:20,749 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=914053.3333333334, ans=0.2 2023-11-20 03:02:40,505 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.40 vs. limit=12.0 2023-11-20 03:02:51,626 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=914253.3333333334, ans=0.025 2023-11-20 03:02:53,049 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=914253.3333333334, ans=0.125 2023-11-20 03:03:02,577 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 137150 2023-11-20 03:03:15,884 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 4900, loss[loss=0.05857, simple_loss=0.06428, pruned_loss=0.01273, audio_tagging_loss=0.0137, over 15248.00 frames. ], tot_loss[loss=0.08162, simple_loss=0.1009, pruned_loss=0.02061, audio_tagging_loss=0.01054, over 3038697.03 frames. ], batch size: 57, lr: 5.79e-03, grad_scale: 32.0 2023-11-20 03:03:35,171 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.551e+01 8.052e+01 8.825e+01 9.558e+01 1.326e+02, threshold=1.765e+02, percent-clipped=0.0 2023-11-20 03:03:46,374 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=914520.0, ans=0.035 2023-11-20 03:03:50,144 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=914520.0, ans=0.1 2023-11-20 03:03:51,281 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=914520.0, ans=0.2 2023-11-20 03:04:07,055 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 137200 2023-11-20 03:04:15,704 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=914653.3333333334, ans=0.05 2023-11-20 03:04:20,309 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 4950, loss[loss=0.07808, simple_loss=0.1093, pruned_loss=0.01585, audio_tagging_loss=0.007555, over 16142.00 frames. ], tot_loss[loss=0.08194, simple_loss=0.1015, pruned_loss=0.0208, audio_tagging_loss=0.01041, over 3036288.33 frames. ], batch size: 61, lr: 5.79e-03, grad_scale: 32.0 2023-11-20 03:04:23,772 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=914720.0, ans=0.125 2023-11-20 03:04:31,126 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=914720.0, ans=0.0 2023-11-20 03:04:49,270 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.89 vs. limit=15.0 2023-11-20 03:04:56,859 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=914853.3333333334, ans=0.2 2023-11-20 03:04:58,097 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=3.730e-01 2023-11-20 03:04:59,694 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.03 vs. limit=15.0 2023-11-20 03:05:12,469 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 137250 2023-11-20 03:05:22,240 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=914986.6666666666, ans=0.125 2023-11-20 03:05:23,984 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.17 vs. limit=22.5 2023-11-20 03:05:24,391 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 5000, loss[loss=0.1049, simple_loss=0.1331, pruned_loss=0.03105, audio_tagging_loss=0.007289, over 16818.00 frames. ], tot_loss[loss=0.08211, simple_loss=0.1023, pruned_loss=0.02081, audio_tagging_loss=0.01017, over 3034454.91 frames. ], batch size: 60, lr: 5.79e-03, grad_scale: 32.0 2023-11-20 03:05:31,124 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.73 vs. limit=6.0 2023-11-20 03:05:43,918 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.699e+01 7.904e+01 8.718e+01 9.618e+01 1.428e+02, threshold=1.744e+02, percent-clipped=0.0 2023-11-20 03:05:45,527 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=915120.0, ans=0.2 2023-11-20 03:05:50,904 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.42 vs. limit=22.5 2023-11-20 03:06:06,100 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=915253.3333333334, ans=0.125 2023-11-20 03:06:11,955 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=915253.3333333334, ans=0.125 2023-11-20 03:06:15,469 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 137300 2023-11-20 03:06:28,279 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 5050, loss[loss=0.07913, simple_loss=0.08937, pruned_loss=0.02196, audio_tagging_loss=0.01249, over 14916.00 frames. ], tot_loss[loss=0.08188, simple_loss=0.1021, pruned_loss=0.02077, audio_tagging_loss=0.01008, over 3037038.58 frames. ], batch size: 58, lr: 5.79e-03, grad_scale: 32.0 2023-11-20 03:06:32,763 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=915386.6666666666, ans=0.1 2023-11-20 03:06:58,370 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=915520.0, ans=10.0 2023-11-20 03:06:59,697 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=915520.0, ans=0.0 2023-11-20 03:07:00,935 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=915520.0, ans=0.125 2023-11-20 03:07:06,391 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.90 vs. limit=10.0 2023-11-20 03:07:20,285 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 137350 2023-11-20 03:07:32,313 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 5100, loss[loss=0.06305, simple_loss=0.07414, pruned_loss=0.0168, audio_tagging_loss=0.009176, over 15729.00 frames. ], tot_loss[loss=0.08258, simple_loss=0.1028, pruned_loss=0.02115, audio_tagging_loss=0.01003, over 3045393.89 frames. ], batch size: 61, lr: 5.79e-03, grad_scale: 32.0 2023-11-20 03:07:38,613 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=915720.0, ans=0.1 2023-11-20 03:07:51,975 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.683e+01 7.866e+01 8.491e+01 9.250e+01 1.522e+02, threshold=1.698e+02, percent-clipped=0.0 2023-11-20 03:07:59,087 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=915853.3333333334, ans=0.1 2023-11-20 03:08:04,212 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.29 vs. limit=15.0 2023-11-20 03:08:14,672 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=915920.0, ans=0.125 2023-11-20 03:08:24,143 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 137400 2023-11-20 03:08:37,461 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 5150, loss[loss=0.08835, simple_loss=0.1113, pruned_loss=0.0225, audio_tagging_loss=0.01019, over 16396.00 frames. ], tot_loss[loss=0.08205, simple_loss=0.1025, pruned_loss=0.02091, audio_tagging_loss=0.009908, over 3041860.45 frames. ], batch size: 61, lr: 5.78e-03, grad_scale: 32.0 2023-11-20 03:08:41,440 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=916053.3333333334, ans=0.125 2023-11-20 03:09:02,583 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=916186.6666666666, ans=0.0 2023-11-20 03:09:27,242 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=916253.3333333334, ans=10.0 2023-11-20 03:09:29,568 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 137450 2023-11-20 03:09:40,083 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=916320.0, ans=0.0 2023-11-20 03:09:42,209 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 5200, loss[loss=0.07789, simple_loss=0.1014, pruned_loss=0.01791, audio_tagging_loss=0.00927, over 15151.00 frames. ], tot_loss[loss=0.08309, simple_loss=0.104, pruned_loss=0.02121, audio_tagging_loss=0.009876, over 3049950.09 frames. ], batch size: 55, lr: 5.78e-03, grad_scale: 32.0 2023-11-20 03:09:46,313 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=916386.6666666666, ans=0.125 2023-11-20 03:10:01,269 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.268e+01 8.182e+01 8.792e+01 9.724e+01 1.387e+02, threshold=1.758e+02, percent-clipped=0.0 2023-11-20 03:10:18,834 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=916520.0, ans=0.0 2023-11-20 03:10:30,633 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=916586.6666666666, ans=0.125 2023-11-20 03:10:34,210 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 137500 2023-11-20 03:10:46,747 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 5250, loss[loss=0.07919, simple_loss=0.1045, pruned_loss=0.01783, audio_tagging_loss=0.009124, over 14807.00 frames. ], tot_loss[loss=0.0832, simple_loss=0.1043, pruned_loss=0.02123, audio_tagging_loss=0.00984, over 3050917.36 frames. ], batch size: 55, lr: 5.78e-03, grad_scale: 32.0 2023-11-20 03:10:56,175 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=916720.0, ans=0.0 2023-11-20 03:10:59,181 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=916786.6666666666, ans=0.1 2023-11-20 03:11:19,506 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=916853.3333333334, ans=0.0 2023-11-20 03:11:38,206 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 137550 2023-11-20 03:11:38,471 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=916986.6666666666, ans=0.07 2023-11-20 03:11:39,461 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=916986.6666666666, ans=0.1 2023-11-20 03:11:51,370 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 5300, loss[loss=0.04979, simple_loss=0.0612, pruned_loss=0.007849, audio_tagging_loss=0.01134, over 13865.00 frames. ], tot_loss[loss=0.08262, simple_loss=0.1034, pruned_loss=0.02098, audio_tagging_loss=0.009951, over 3047983.47 frames. ], batch size: 55, lr: 5.78e-03, grad_scale: 32.0 2023-11-20 03:12:01,558 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=917053.3333333334, ans=0.0 2023-11-20 03:12:03,872 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=917120.0, ans=0.125 2023-11-20 03:12:10,536 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.799e+01 8.318e+01 9.072e+01 9.912e+01 1.242e+02, threshold=1.814e+02, percent-clipped=0.0 2023-11-20 03:12:10,854 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=917120.0, ans=0.2 2023-11-20 03:12:22,467 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=917186.6666666666, ans=0.1 2023-11-20 03:12:23,739 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=917186.6666666666, ans=0.0 2023-11-20 03:12:36,793 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.38 vs. limit=15.0 2023-11-20 03:12:43,190 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 137600 2023-11-20 03:12:49,609 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=917320.0, ans=0.0 2023-11-20 03:12:56,059 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 5350, loss[loss=0.07411, simple_loss=0.08931, pruned_loss=0.01886, audio_tagging_loss=0.01059, over 14814.00 frames. ], tot_loss[loss=0.08281, simple_loss=0.1037, pruned_loss=0.02095, audio_tagging_loss=0.01002, over 3046991.39 frames. ], batch size: 55, lr: 5.78e-03, grad_scale: 32.0 2023-11-20 03:13:08,476 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.36 vs. limit=12.0 2023-11-20 03:13:10,384 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=917453.3333333334, ans=0.125 2023-11-20 03:13:18,768 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=917453.3333333334, ans=0.1 2023-11-20 03:13:28,069 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=917520.0, ans=0.0 2023-11-20 03:13:36,824 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.22 vs. limit=10.0 2023-11-20 03:13:46,641 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=917653.3333333334, ans=0.1 2023-11-20 03:13:47,649 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 137650 2023-11-20 03:13:55,731 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=917653.3333333334, ans=0.2 2023-11-20 03:14:00,431 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 5400, loss[loss=0.07642, simple_loss=0.09771, pruned_loss=0.01848, audio_tagging_loss=0.009084, over 14592.00 frames. ], tot_loss[loss=0.08333, simple_loss=0.1041, pruned_loss=0.0212, audio_tagging_loss=0.01007, over 3045541.41 frames. ], batch size: 53, lr: 5.78e-03, grad_scale: 32.0 2023-11-20 03:14:19,180 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.643e+01 8.272e+01 8.874e+01 9.617e+01 1.716e+02, threshold=1.775e+02, percent-clipped=0.0 2023-11-20 03:14:35,132 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=917853.3333333334, ans=0.125 2023-11-20 03:14:41,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=917920.0, ans=0.125 2023-11-20 03:14:51,527 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 137700 2023-11-20 03:15:04,134 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 5450, loss[loss=0.09298, simple_loss=0.1226, pruned_loss=0.02384, audio_tagging_loss=0.007854, over 16172.00 frames. ], tot_loss[loss=0.08356, simple_loss=0.1041, pruned_loss=0.02144, audio_tagging_loss=0.01005, over 3045046.76 frames. ], batch size: 57, lr: 5.78e-03, grad_scale: 32.0 2023-11-20 03:15:18,268 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=918120.0, ans=0.125 2023-11-20 03:15:22,556 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=918120.0, ans=0.0 2023-11-20 03:15:30,452 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=918186.6666666666, ans=0.125 2023-11-20 03:15:30,503 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=918186.6666666666, ans=0.0 2023-11-20 03:15:41,352 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=918253.3333333334, ans=0.0 2023-11-20 03:15:55,378 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 137750 2023-11-20 03:16:00,526 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=918320.0, ans=0.125 2023-11-20 03:16:08,292 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 5500, loss[loss=0.05879, simple_loss=0.06558, pruned_loss=0.01153, audio_tagging_loss=0.01447, over 16156.00 frames. ], tot_loss[loss=0.08379, simple_loss=0.1045, pruned_loss=0.02151, audio_tagging_loss=0.01003, over 3044818.08 frames. ], batch size: 62, lr: 5.78e-03, grad_scale: 32.0 2023-11-20 03:16:22,421 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.87 vs. limit=15.0 2023-11-20 03:16:25,725 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=918453.3333333334, ans=0.1 2023-11-20 03:16:27,772 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.847e+01 8.238e+01 9.064e+01 9.896e+01 2.099e+02, threshold=1.813e+02, percent-clipped=1.0 2023-11-20 03:16:55,208 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.44 vs. limit=15.0 2023-11-20 03:17:00,397 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 137800 2023-11-20 03:17:13,443 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 5550, loss[loss=0.1058, simple_loss=0.1333, pruned_loss=0.02986, audio_tagging_loss=0.00931, over 14890.00 frames. ], tot_loss[loss=0.08333, simple_loss=0.1035, pruned_loss=0.0213, audio_tagging_loss=0.01028, over 3050715.71 frames. ], batch size: 55, lr: 5.78e-03, grad_scale: 32.0 2023-11-20 03:17:20,254 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.21 vs. limit=10.0 2023-11-20 03:17:27,659 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=918786.6666666666, ans=0.125 2023-11-20 03:17:39,943 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.59 vs. limit=22.5 2023-11-20 03:17:40,837 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=918853.3333333334, ans=0.125 2023-11-20 03:17:54,236 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=918920.0, ans=0.04949747468305833 2023-11-20 03:17:58,657 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 03:18:04,339 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 137850 2023-11-20 03:18:16,834 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 5600, loss[loss=0.07576, simple_loss=0.09506, pruned_loss=0.01768, audio_tagging_loss=0.01055, over 13712.00 frames. ], tot_loss[loss=0.08322, simple_loss=0.1035, pruned_loss=0.02113, audio_tagging_loss=0.01032, over 3052560.68 frames. ], batch size: 53, lr: 5.78e-03, grad_scale: 32.0 2023-11-20 03:18:25,495 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=919053.3333333334, ans=0.125 2023-11-20 03:18:35,342 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.491e+01 8.234e+01 9.061e+01 1.016e+02 1.381e+02, threshold=1.812e+02, percent-clipped=0.0 2023-11-20 03:18:49,965 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=919186.6666666666, ans=0.125 2023-11-20 03:19:02,532 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 03:19:04,054 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=919253.3333333334, ans=0.0 2023-11-20 03:19:07,338 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 137900 2023-11-20 03:19:14,599 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=919320.0, ans=0.1 2023-11-20 03:19:19,191 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 5650, loss[loss=0.1116, simple_loss=0.1444, pruned_loss=0.03385, audio_tagging_loss=0.005527, over 16090.00 frames. ], tot_loss[loss=0.08321, simple_loss=0.1034, pruned_loss=0.0211, audio_tagging_loss=0.01039, over 3045646.86 frames. ], batch size: 56, lr: 5.77e-03, grad_scale: 32.0 2023-11-20 03:19:28,511 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=919386.6666666666, ans=10.0 2023-11-20 03:19:29,053 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=14.72 vs. limit=15.0 2023-11-20 03:19:37,821 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=919453.3333333334, ans=0.0 2023-11-20 03:20:09,967 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 137950 2023-11-20 03:20:13,223 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=919653.3333333334, ans=0.125 2023-11-20 03:20:23,360 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 5700, loss[loss=0.07749, simple_loss=0.09561, pruned_loss=0.01853, audio_tagging_loss=0.01116, over 14993.00 frames. ], tot_loss[loss=0.08325, simple_loss=0.1035, pruned_loss=0.02117, audio_tagging_loss=0.01035, over 3042224.26 frames. ], batch size: 57, lr: 5.77e-03, grad_scale: 32.0 2023-11-20 03:20:28,497 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=919720.0, ans=0.1 2023-11-20 03:20:34,838 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=919786.6666666666, ans=0.125 2023-11-20 03:20:42,397 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.046e+01 8.375e+01 8.900e+01 9.766e+01 1.297e+02, threshold=1.780e+02, percent-clipped=0.0 2023-11-20 03:20:45,196 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=919786.6666666666, ans=0.1 2023-11-20 03:20:45,485 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.54 vs. limit=15.0 2023-11-20 03:20:48,016 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.14 vs. limit=12.0 2023-11-20 03:21:12,249 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.82 vs. limit=10.0 2023-11-20 03:21:15,205 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 138000 2023-11-20 03:21:28,283 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 5750, loss[loss=0.09816, simple_loss=0.1443, pruned_loss=0.02143, audio_tagging_loss=0.0046, over 15944.00 frames. ], tot_loss[loss=0.08283, simple_loss=0.1031, pruned_loss=0.02107, audio_tagging_loss=0.0102, over 3051381.42 frames. ], batch size: 56, lr: 5.77e-03, grad_scale: 32.0 2023-11-20 03:21:46,713 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=920120.0, ans=0.1 2023-11-20 03:22:09,604 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=920253.3333333334, ans=0.1 2023-11-20 03:22:16,337 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=920253.3333333334, ans=0.2 2023-11-20 03:22:18,789 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=920320.0, ans=0.0 2023-11-20 03:22:19,754 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 138050 2023-11-20 03:22:21,655 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.90 vs. limit=15.0 2023-11-20 03:22:25,282 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.89 vs. limit=6.0 2023-11-20 03:22:26,062 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=920320.0, ans=0.125 2023-11-20 03:22:29,616 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=920320.0, ans=0.035 2023-11-20 03:22:31,827 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 5800, loss[loss=0.07934, simple_loss=0.09649, pruned_loss=0.02168, audio_tagging_loss=0.009406, over 16732.00 frames. ], tot_loss[loss=0.08342, simple_loss=0.104, pruned_loss=0.02139, audio_tagging_loss=0.01003, over 3050436.24 frames. ], batch size: 62, lr: 5.77e-03, grad_scale: 32.0 2023-11-20 03:22:32,270 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=920386.6666666666, ans=0.2 2023-11-20 03:22:51,904 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.64 vs. limit=15.0 2023-11-20 03:22:52,123 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.779e+01 8.315e+01 8.891e+01 9.653e+01 1.172e+02, threshold=1.778e+02, percent-clipped=0.0 2023-11-20 03:22:56,434 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.74 vs. limit=6.0 2023-11-20 03:23:00,032 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.87 vs. limit=22.5 2023-11-20 03:23:02,595 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.19 vs. limit=15.0 2023-11-20 03:23:03,329 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=920520.0, ans=0.2 2023-11-20 03:23:03,386 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 03:23:05,209 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=920520.0, ans=0.0 2023-11-20 03:23:14,874 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 03:23:23,084 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 138100 2023-11-20 03:23:23,356 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=920653.3333333334, ans=0.1 2023-11-20 03:23:24,688 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.18 vs. limit=15.0 2023-11-20 03:23:31,625 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.83 vs. limit=22.5 2023-11-20 03:23:36,515 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 5850, loss[loss=0.07868, simple_loss=0.1029, pruned_loss=0.01961, audio_tagging_loss=0.007611, over 14855.00 frames. ], tot_loss[loss=0.08256, simple_loss=0.1028, pruned_loss=0.0212, audio_tagging_loss=0.009958, over 3044061.37 frames. ], batch size: 54, lr: 5.77e-03, grad_scale: 32.0 2023-11-20 03:23:48,697 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=920786.6666666666, ans=0.0 2023-11-20 03:23:52,507 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=920786.6666666666, ans=0.025 2023-11-20 03:23:59,237 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=920786.6666666666, ans=0.1 2023-11-20 03:24:24,574 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=920920.0, ans=0.0 2023-11-20 03:24:27,845 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 138150 2023-11-20 03:24:39,929 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 5900, loss[loss=0.08138, simple_loss=0.1071, pruned_loss=0.01625, audio_tagging_loss=0.01157, over 15894.00 frames. ], tot_loss[loss=0.0818, simple_loss=0.1021, pruned_loss=0.02074, audio_tagging_loss=0.009998, over 3048392.33 frames. ], batch size: 58, lr: 5.77e-03, grad_scale: 16.0 2023-11-20 03:24:41,479 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=921053.3333333334, ans=0.125 2023-11-20 03:24:45,617 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 03:24:59,819 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.588e+01 8.195e+01 8.943e+01 1.006e+02 1.652e+02, threshold=1.789e+02, percent-clipped=0.0 2023-11-20 03:25:16,801 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.00 vs. limit=15.0 2023-11-20 03:25:31,450 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 138200 2023-11-20 03:25:43,827 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 5950, loss[loss=0.06852, simple_loss=0.08931, pruned_loss=0.01378, audio_tagging_loss=0.01009, over 15261.00 frames. ], tot_loss[loss=0.08172, simple_loss=0.1019, pruned_loss=0.02071, audio_tagging_loss=0.01005, over 3051150.22 frames. ], batch size: 57, lr: 5.77e-03, grad_scale: 16.0 2023-11-20 03:26:03,600 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=921453.3333333334, ans=0.0 2023-11-20 03:26:05,377 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=921453.3333333334, ans=0.125 2023-11-20 03:26:26,768 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=921586.6666666666, ans=0.0 2023-11-20 03:26:34,906 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 138250 2023-11-20 03:26:41,001 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=921653.3333333334, ans=0.125 2023-11-20 03:26:47,501 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 6000, loss[loss=0.07483, simple_loss=0.09828, pruned_loss=0.0187, audio_tagging_loss=0.006989, over 14733.00 frames. ], tot_loss[loss=0.08181, simple_loss=0.102, pruned_loss=0.02083, audio_tagging_loss=0.009973, over 3045412.42 frames. ], batch size: 54, lr: 5.77e-03, grad_scale: 32.0 2023-11-20 03:26:47,505 INFO [train_asr.py:1285] (0/4) Computing validation loss 2023-11-20 03:27:16,009 INFO [zipformer.py:1873] (0/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.1131, 2.2307, 5.0073, 2.5397], device='cuda:0') 2023-11-20 03:27:28,659 INFO [train_asr.py:1294] (0/4) Epoch 12, validation: loss=0.06387, simple_loss=0.05435, pruned_loss=0.006012, audio_tagging_loss=0.03068, over 4681554.00 frames. 2023-11-20 03:27:28,660 INFO [train_asr.py:1295] (0/4) Maximum memory allocated so far is 25925MB 2023-11-20 03:27:43,875 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.50 vs. limit=22.5 2023-11-20 03:27:48,128 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.723e+01 8.205e+01 8.900e+01 1.006e+02 1.555e+02, threshold=1.780e+02, percent-clipped=0.0 2023-11-20 03:27:48,574 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=921786.6666666666, ans=0.04949747468305833 2023-11-20 03:27:58,120 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.33 vs. limit=22.5 2023-11-20 03:28:02,303 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.16 vs. limit=22.5 2023-11-20 03:28:02,329 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.21 vs. limit=22.5 2023-11-20 03:28:16,465 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 03:28:16,982 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.24 vs. limit=22.5 2023-11-20 03:28:20,244 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 138300 2023-11-20 03:28:32,675 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 6050, loss[loss=0.08071, simple_loss=0.09527, pruned_loss=0.02237, audio_tagging_loss=0.01071, over 14522.00 frames. ], tot_loss[loss=0.0821, simple_loss=0.1022, pruned_loss=0.02108, audio_tagging_loss=0.009916, over 3039772.40 frames. ], batch size: 54, lr: 5.77e-03, grad_scale: 16.0 2023-11-20 03:28:41,531 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.61 vs. limit=15.0 2023-11-20 03:28:47,086 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=922120.0, ans=0.125 2023-11-20 03:28:47,189 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=922120.0, ans=0.5 2023-11-20 03:28:50,202 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=922120.0, ans=0.2 2023-11-20 03:28:50,234 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=922120.0, ans=0.2 2023-11-20 03:29:13,539 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=922253.3333333334, ans=0.125 2023-11-20 03:29:15,329 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.65 vs. limit=15.0 2023-11-20 03:29:16,022 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=922253.3333333334, ans=0.0 2023-11-20 03:29:24,345 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 138350 2023-11-20 03:29:28,951 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=922320.0, ans=0.2 2023-11-20 03:29:37,744 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 6100, loss[loss=0.07827, simple_loss=0.09197, pruned_loss=0.01905, audio_tagging_loss=0.01324, over 15261.00 frames. ], tot_loss[loss=0.08191, simple_loss=0.102, pruned_loss=0.02094, audio_tagging_loss=0.00994, over 3036016.82 frames. ], batch size: 59, lr: 5.76e-03, grad_scale: 16.0 2023-11-20 03:29:41,757 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=922386.6666666666, ans=0.0 2023-11-20 03:29:45,464 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 03:29:48,251 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.54 vs. limit=22.5 2023-11-20 03:30:00,020 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.566e+01 7.861e+01 8.501e+01 9.321e+01 2.317e+02, threshold=1.700e+02, percent-clipped=1.0 2023-11-20 03:30:30,133 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 138400 2023-11-20 03:30:41,594 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.563e-03 2023-11-20 03:30:43,108 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 6150, loss[loss=0.09277, simple_loss=0.124, pruned_loss=0.02242, audio_tagging_loss=0.008362, over 16453.00 frames. ], tot_loss[loss=0.08262, simple_loss=0.1034, pruned_loss=0.02107, audio_tagging_loss=0.009868, over 3038442.51 frames. ], batch size: 61, lr: 5.76e-03, grad_scale: 16.0 2023-11-20 03:30:49,991 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.24 vs. limit=15.0 2023-11-20 03:31:34,749 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 138450 2023-11-20 03:31:37,509 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=922986.6666666666, ans=0.025 2023-11-20 03:31:38,069 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.34 vs. limit=12.0 2023-11-20 03:31:42,381 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=922986.6666666666, ans=0.125 2023-11-20 03:31:47,111 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 6200, loss[loss=0.07405, simple_loss=0.08925, pruned_loss=0.02166, audio_tagging_loss=0.007764, over 13828.00 frames. ], tot_loss[loss=0.08235, simple_loss=0.1026, pruned_loss=0.02102, audio_tagging_loss=0.01004, over 3035011.78 frames. ], batch size: 54, lr: 5.76e-03, grad_scale: 16.0 2023-11-20 03:31:50,008 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=923053.3333333334, ans=0.125 2023-11-20 03:31:55,027 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=923053.3333333334, ans=0.05 2023-11-20 03:32:01,633 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=923120.0, ans=0.0 2023-11-20 03:32:03,134 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 03:32:09,322 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.126e+01 8.412e+01 9.324e+01 1.012e+02 1.364e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-20 03:32:13,384 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=923186.6666666666, ans=0.1 2023-11-20 03:32:36,638 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=923253.3333333334, ans=0.1 2023-11-20 03:32:39,136 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 138500 2023-11-20 03:32:51,988 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 6250, loss[loss=0.08975, simple_loss=0.1094, pruned_loss=0.0235, audio_tagging_loss=0.01154, over 15426.00 frames. ], tot_loss[loss=0.08225, simple_loss=0.102, pruned_loss=0.02107, audio_tagging_loss=0.01019, over 3034808.03 frames. ], batch size: 56, lr: 5.76e-03, grad_scale: 16.0 2023-11-20 03:32:56,985 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.37 vs. limit=15.0 2023-11-20 03:33:00,111 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 03:33:29,485 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=923586.6666666666, ans=0.04949747468305833 2023-11-20 03:33:33,015 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=923586.6666666666, ans=0.2 2023-11-20 03:33:34,928 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=923586.6666666666, ans=0.5 2023-11-20 03:33:43,693 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 138550 2023-11-20 03:33:46,313 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=923653.3333333334, ans=0.07 2023-11-20 03:33:47,503 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.523e-03 2023-11-20 03:33:51,508 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.48 vs. limit=15.0 2023-11-20 03:33:53,595 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=923653.3333333334, ans=0.0 2023-11-20 03:33:55,711 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 6300, loss[loss=0.09796, simple_loss=0.1221, pruned_loss=0.02829, audio_tagging_loss=0.008631, over 15783.00 frames. ], tot_loss[loss=0.08198, simple_loss=0.1015, pruned_loss=0.02096, audio_tagging_loss=0.01029, over 3037168.22 frames. ], batch size: 59, lr: 5.76e-03, grad_scale: 16.0 2023-11-20 03:33:57,448 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=923720.0, ans=0.1 2023-11-20 03:34:00,541 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=923720.0, ans=0.125 2023-11-20 03:34:10,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=923786.6666666666, ans=0.0 2023-11-20 03:34:17,721 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.678e+01 8.557e+01 9.163e+01 1.006e+02 1.411e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-20 03:34:18,088 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=923786.6666666666, ans=0.1 2023-11-20 03:34:32,000 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.53 vs. limit=15.0 2023-11-20 03:34:48,361 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 138600 2023-11-20 03:34:48,530 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=923986.6666666666, ans=0.5 2023-11-20 03:34:59,702 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=923986.6666666666, ans=0.2 2023-11-20 03:35:01,677 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 6350, loss[loss=0.09162, simple_loss=0.1184, pruned_loss=0.02279, audio_tagging_loss=0.009636, over 14442.00 frames. ], tot_loss[loss=0.08132, simple_loss=0.1007, pruned_loss=0.0206, audio_tagging_loss=0.01038, over 3038553.14 frames. ], batch size: 53, lr: 5.76e-03, grad_scale: 16.0 2023-11-20 03:35:32,983 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=924186.6666666666, ans=0.2 2023-11-20 03:35:33,381 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.49 vs. limit=15.0 2023-11-20 03:35:39,853 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=924253.3333333334, ans=0.125 2023-11-20 03:35:53,797 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 138650 2023-11-20 03:36:06,518 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 6400, loss[loss=0.08325, simple_loss=0.1082, pruned_loss=0.01888, audio_tagging_loss=0.01028, over 14928.00 frames. ], tot_loss[loss=0.08223, simple_loss=0.1017, pruned_loss=0.02094, audio_tagging_loss=0.01044, over 3046267.83 frames. ], batch size: 56, lr: 5.76e-03, grad_scale: 32.0 2023-11-20 03:36:13,929 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.58 vs. limit=15.0 2023-11-20 03:36:28,542 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.492e+01 8.195e+01 8.907e+01 9.891e+01 1.303e+02, threshold=1.781e+02, percent-clipped=0.0 2023-11-20 03:36:52,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=924586.6666666666, ans=0.2 2023-11-20 03:36:58,266 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 138700 2023-11-20 03:37:11,078 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 6450, loss[loss=0.06212, simple_loss=0.07587, pruned_loss=0.01274, audio_tagging_loss=0.01145, over 14534.00 frames. ], tot_loss[loss=0.08238, simple_loss=0.1017, pruned_loss=0.02103, audio_tagging_loss=0.01048, over 3046006.41 frames. ], batch size: 57, lr: 5.76e-03, grad_scale: 32.0 2023-11-20 03:37:11,693 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.35 vs. limit=15.0 2023-11-20 03:37:33,240 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=924786.6666666666, ans=0.0 2023-11-20 03:37:33,308 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=924786.6666666666, ans=0.1 2023-11-20 03:37:58,920 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=924920.0, ans=0.1 2023-11-20 03:38:01,412 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=924986.6666666666, ans=0.2 2023-11-20 03:38:01,434 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=924986.6666666666, ans=0.125 2023-11-20 03:38:02,390 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 138750 2023-11-20 03:38:02,897 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=924986.6666666666, ans=15.0 2023-11-20 03:38:09,402 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=924986.6666666666, ans=0.125 2023-11-20 03:38:15,253 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 6500, loss[loss=0.08749, simple_loss=0.1148, pruned_loss=0.02249, audio_tagging_loss=0.007577, over 15520.00 frames. ], tot_loss[loss=0.08268, simple_loss=0.1023, pruned_loss=0.02113, audio_tagging_loss=0.01042, over 3042941.51 frames. ], batch size: 56, lr: 5.76e-03, grad_scale: 32.0 2023-11-20 03:38:18,553 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=925053.3333333334, ans=0.125 2023-11-20 03:38:27,407 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.57 vs. limit=6.0 2023-11-20 03:38:37,926 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.023e+01 8.283e+01 9.037e+01 9.701e+01 1.555e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-20 03:38:40,786 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=925186.6666666666, ans=0.07 2023-11-20 03:38:44,444 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=925186.6666666666, ans=0.0 2023-11-20 03:39:03,044 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=925253.3333333334, ans=0.1 2023-11-20 03:39:06,601 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 138800 2023-11-20 03:39:20,280 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 6550, loss[loss=0.07696, simple_loss=0.09165, pruned_loss=0.02029, audio_tagging_loss=0.01084, over 14771.00 frames. ], tot_loss[loss=0.08212, simple_loss=0.1017, pruned_loss=0.02092, audio_tagging_loss=0.01034, over 3042329.67 frames. ], batch size: 56, lr: 5.76e-03, grad_scale: 32.0 2023-11-20 03:39:22,097 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.89 vs. limit=15.0 2023-11-20 03:39:28,819 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=925386.6666666666, ans=0.1 2023-11-20 03:39:36,344 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=925453.3333333334, ans=0.0 2023-11-20 03:39:43,187 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=925453.3333333334, ans=0.125 2023-11-20 03:39:53,976 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=925520.0, ans=0.1 2023-11-20 03:40:02,663 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.24 vs. limit=22.5 2023-11-20 03:40:07,026 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=925586.6666666666, ans=0.125 2023-11-20 03:40:12,485 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 138850 2023-11-20 03:40:25,154 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 6600, loss[loss=0.0917, simple_loss=0.1202, pruned_loss=0.02364, audio_tagging_loss=0.00794, over 15814.00 frames. ], tot_loss[loss=0.08166, simple_loss=0.1015, pruned_loss=0.02072, audio_tagging_loss=0.01018, over 3037140.68 frames. ], batch size: 56, lr: 5.75e-03, grad_scale: 32.0 2023-11-20 03:40:43,297 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 03:40:44,366 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=925786.6666666666, ans=0.0 2023-11-20 03:40:46,613 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.736e+01 8.079e+01 8.710e+01 9.676e+01 1.211e+02, threshold=1.742e+02, percent-clipped=0.0 2023-11-20 03:40:48,073 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=925786.6666666666, ans=0.2 2023-11-20 03:40:58,104 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.83 vs. limit=15.0 2023-11-20 03:41:17,212 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 138900 2023-11-20 03:41:25,380 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=925986.6666666666, ans=0.0 2023-11-20 03:41:25,855 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.10 vs. limit=6.0 2023-11-20 03:41:26,362 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=925986.6666666666, ans=0.015 2023-11-20 03:41:29,866 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 6650, loss[loss=0.07568, simple_loss=0.08514, pruned_loss=0.02098, audio_tagging_loss=0.01213, over 14006.00 frames. ], tot_loss[loss=0.08168, simple_loss=0.1019, pruned_loss=0.02069, audio_tagging_loss=0.01006, over 3035883.75 frames. ], batch size: 54, lr: 5.75e-03, grad_scale: 32.0 2023-11-20 03:41:35,371 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.02 vs. limit=15.0 2023-11-20 03:41:36,120 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=926053.3333333334, ans=0.0 2023-11-20 03:41:56,743 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=926186.6666666666, ans=0.0 2023-11-20 03:42:05,368 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=926186.6666666666, ans=0.0 2023-11-20 03:42:21,848 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 138950 2023-11-20 03:42:24,397 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=926320.0, ans=0.07 2023-11-20 03:42:32,949 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=926386.6666666666, ans=0.125 2023-11-20 03:42:34,584 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 6700, loss[loss=0.06859, simple_loss=0.083, pruned_loss=0.01447, audio_tagging_loss=0.01262, over 14501.00 frames. ], tot_loss[loss=0.08164, simple_loss=0.1016, pruned_loss=0.02066, audio_tagging_loss=0.01017, over 3038812.78 frames. ], batch size: 56, lr: 5.75e-03, grad_scale: 32.0 2023-11-20 03:42:38,116 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=926386.6666666666, ans=0.125 2023-11-20 03:42:39,574 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.99 vs. limit=15.0 2023-11-20 03:42:39,692 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.03 vs. limit=15.0 2023-11-20 03:42:48,381 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=926453.3333333334, ans=0.1 2023-11-20 03:42:56,597 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.827e+01 8.182e+01 8.627e+01 9.432e+01 1.236e+02, threshold=1.725e+02, percent-clipped=0.0 2023-11-20 03:42:58,080 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=926453.3333333334, ans=0.0 2023-11-20 03:43:26,334 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 139000 2023-11-20 03:43:28,115 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=926653.3333333334, ans=0.1 2023-11-20 03:43:36,060 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=926653.3333333334, ans=0.125 2023-11-20 03:43:39,254 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 6750, loss[loss=0.08015, simple_loss=0.093, pruned_loss=0.02013, audio_tagging_loss=0.01352, over 15345.00 frames. ], tot_loss[loss=0.08166, simple_loss=0.1014, pruned_loss=0.02068, audio_tagging_loss=0.01027, over 3039458.16 frames. ], batch size: 56, lr: 5.75e-03, grad_scale: 32.0 2023-11-20 03:44:00,563 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=926786.6666666666, ans=0.125 2023-11-20 03:44:08,094 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=926853.3333333334, ans=0.0 2023-11-20 03:44:10,618 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=926853.3333333334, ans=0.125 2023-11-20 03:44:24,934 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.86 vs. limit=15.0 2023-11-20 03:44:30,353 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 139050 2023-11-20 03:44:37,730 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=926986.6666666666, ans=0.0 2023-11-20 03:44:42,787 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 6800, loss[loss=0.05783, simple_loss=0.06415, pruned_loss=0.01283, audio_tagging_loss=0.01293, over 15699.00 frames. ], tot_loss[loss=0.08174, simple_loss=0.1017, pruned_loss=0.02069, audio_tagging_loss=0.01019, over 3039471.60 frames. ], batch size: 63, lr: 5.75e-03, grad_scale: 32.0 2023-11-20 03:45:06,174 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.197e+01 8.226e+01 8.844e+01 9.966e+01 1.208e+02, threshold=1.769e+02, percent-clipped=0.0 2023-11-20 03:45:32,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=927253.3333333334, ans=0.0 2023-11-20 03:45:34,563 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 139100 2023-11-20 03:45:35,074 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.23 vs. limit=15.0 2023-11-20 03:45:46,931 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 6850, loss[loss=0.08936, simple_loss=0.1092, pruned_loss=0.02395, audio_tagging_loss=0.0108, over 14231.00 frames. ], tot_loss[loss=0.08118, simple_loss=0.1009, pruned_loss=0.02056, audio_tagging_loss=0.01019, over 3043142.81 frames. ], batch size: 56, lr: 5.75e-03, grad_scale: 32.0 2023-11-20 03:45:49,709 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=927386.6666666666, ans=0.0 2023-11-20 03:46:10,158 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=927453.3333333334, ans=0.1 2023-11-20 03:46:38,585 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 139150 2023-11-20 03:46:39,414 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.30 vs. limit=15.0 2023-11-20 03:46:52,497 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 6900, loss[loss=0.06902, simple_loss=0.08436, pruned_loss=0.017, audio_tagging_loss=0.009846, over 15218.00 frames. ], tot_loss[loss=0.08115, simple_loss=0.1009, pruned_loss=0.02046, audio_tagging_loss=0.01026, over 3036893.80 frames. ], batch size: 57, lr: 5.75e-03, grad_scale: 32.0 2023-11-20 03:47:16,180 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.998e+01 8.075e+01 8.808e+01 9.662e+01 1.235e+02, threshold=1.762e+02, percent-clipped=0.0 2023-11-20 03:47:18,196 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.77 vs. limit=10.0 2023-11-20 03:47:21,479 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=927853.3333333334, ans=0.125 2023-11-20 03:47:28,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=927920.0, ans=0.0 2023-11-20 03:47:28,716 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=927920.0, ans=0.0 2023-11-20 03:47:39,370 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=5.521e-01 2023-11-20 03:47:42,937 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 03:47:44,313 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 139200 2023-11-20 03:47:57,546 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 6950, loss[loss=0.08223, simple_loss=0.1021, pruned_loss=0.02162, audio_tagging_loss=0.009555, over 14811.00 frames. ], tot_loss[loss=0.08225, simple_loss=0.1025, pruned_loss=0.02079, audio_tagging_loss=0.01022, over 3042023.42 frames. ], batch size: 54, lr: 5.75e-03, grad_scale: 16.0 2023-11-20 03:48:08,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=928120.0, ans=0.025 2023-11-20 03:48:15,025 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=928120.0, ans=0.125 2023-11-20 03:48:18,621 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=928120.0, ans=0.125 2023-11-20 03:48:36,388 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=928253.3333333334, ans=0.125 2023-11-20 03:48:36,562 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=928253.3333333334, ans=0.0 2023-11-20 03:48:48,902 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 139250 2023-11-20 03:48:58,858 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=928320.0, ans=0.0 2023-11-20 03:49:01,014 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 7000, loss[loss=0.07419, simple_loss=0.09387, pruned_loss=0.01481, audio_tagging_loss=0.01244, over 14996.00 frames. ], tot_loss[loss=0.08211, simple_loss=0.1024, pruned_loss=0.0207, audio_tagging_loss=0.01019, over 3042022.25 frames. ], batch size: 55, lr: 5.75e-03, grad_scale: 16.0 2023-11-20 03:49:25,770 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.837e+01 8.169e+01 8.824e+01 9.776e+01 1.242e+02, threshold=1.765e+02, percent-clipped=0.0 2023-11-20 03:49:48,556 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=928586.6666666666, ans=0.0 2023-11-20 03:49:52,156 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 139300 2023-11-20 03:49:55,935 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=928653.3333333334, ans=0.125 2023-11-20 03:50:04,951 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.11 vs. limit=15.0 2023-11-20 03:50:05,520 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 7050, loss[loss=0.08699, simple_loss=0.1055, pruned_loss=0.02398, audio_tagging_loss=0.01026, over 14563.00 frames. ], tot_loss[loss=0.08269, simple_loss=0.1032, pruned_loss=0.02088, audio_tagging_loss=0.01019, over 3041413.53 frames. ], batch size: 56, lr: 5.75e-03, grad_scale: 16.0 2023-11-20 03:50:11,383 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=928720.0, ans=0.1 2023-11-20 03:50:18,559 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=928786.6666666666, ans=0.125 2023-11-20 03:50:25,469 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=928786.6666666666, ans=0.125 2023-11-20 03:50:46,384 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=928920.0, ans=0.1 2023-11-20 03:50:57,612 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 139350 2023-11-20 03:51:07,283 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 03:51:07,492 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.86 vs. limit=6.0 2023-11-20 03:51:08,491 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=928986.6666666666, ans=0.0 2023-11-20 03:51:10,696 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 7100, loss[loss=0.07813, simple_loss=0.09281, pruned_loss=0.02035, audio_tagging_loss=0.01137, over 16309.00 frames. ], tot_loss[loss=0.08205, simple_loss=0.102, pruned_loss=0.02074, audio_tagging_loss=0.01033, over 3043530.50 frames. ], batch size: 62, lr: 5.74e-03, grad_scale: 16.0 2023-11-20 03:51:15,225 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=929053.3333333334, ans=0.0 2023-11-20 03:51:16,917 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.14 vs. limit=22.5 2023-11-20 03:51:18,945 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=929053.3333333334, ans=0.125 2023-11-20 03:51:34,670 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.602e+01 8.104e+01 8.912e+01 9.574e+01 1.346e+02, threshold=1.782e+02, percent-clipped=0.0 2023-11-20 03:51:35,128 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=929186.6666666666, ans=0.125 2023-11-20 03:52:03,533 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 139400 2023-11-20 03:52:15,901 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 7150, loss[loss=0.0694, simple_loss=0.07888, pruned_loss=0.01924, audio_tagging_loss=0.01072, over 14568.00 frames. ], tot_loss[loss=0.08208, simple_loss=0.1019, pruned_loss=0.0208, audio_tagging_loss=0.01032, over 3042642.78 frames. ], batch size: 58, lr: 5.74e-03, grad_scale: 16.0 2023-11-20 03:52:54,588 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=929586.6666666666, ans=0.0 2023-11-20 03:53:00,771 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=929586.6666666666, ans=0.0 2023-11-20 03:53:02,302 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.60 vs. limit=22.5 2023-11-20 03:53:06,814 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=929653.3333333334, ans=0.0 2023-11-20 03:53:07,784 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 139450 2023-11-20 03:53:20,614 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 7200, loss[loss=0.08867, simple_loss=0.1089, pruned_loss=0.02309, audio_tagging_loss=0.01115, over 15275.00 frames. ], tot_loss[loss=0.08163, simple_loss=0.1014, pruned_loss=0.02055, audio_tagging_loss=0.01039, over 3037704.81 frames. ], batch size: 57, lr: 5.74e-03, grad_scale: 32.0 2023-11-20 03:53:25,169 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=929720.0, ans=0.125 2023-11-20 03:53:42,456 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=929786.6666666666, ans=0.0 2023-11-20 03:53:45,896 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.207e+01 8.353e+01 8.991e+01 9.790e+01 1.410e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-20 03:53:52,314 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=929853.3333333334, ans=0.1 2023-11-20 03:54:13,125 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 139500 2023-11-20 03:54:25,362 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 7250, loss[loss=0.0701, simple_loss=0.08213, pruned_loss=0.01618, audio_tagging_loss=0.01286, over 14575.00 frames. ], tot_loss[loss=0.08178, simple_loss=0.1016, pruned_loss=0.02057, audio_tagging_loss=0.01041, over 3035924.52 frames. ], batch size: 59, lr: 5.74e-03, grad_scale: 16.0 2023-11-20 03:54:29,905 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=930053.3333333334, ans=0.125 2023-11-20 03:54:33,393 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=930053.3333333334, ans=0.125 2023-11-20 03:55:03,985 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=930253.3333333334, ans=0.0 2023-11-20 03:55:05,374 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=930253.3333333334, ans=0.95 2023-11-20 03:55:15,509 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.68 vs. limit=15.0 2023-11-20 03:55:16,596 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=930320.0, ans=0.2 2023-11-20 03:55:17,515 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 139550 2023-11-20 03:55:30,583 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.36 vs. limit=12.0 2023-11-20 03:55:30,930 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 7300, loss[loss=0.0828, simple_loss=0.1061, pruned_loss=0.02002, audio_tagging_loss=0.009741, over 14642.00 frames. ], tot_loss[loss=0.08127, simple_loss=0.1009, pruned_loss=0.0205, audio_tagging_loss=0.01034, over 3030607.51 frames. ], batch size: 55, lr: 5.74e-03, grad_scale: 16.0 2023-11-20 03:55:33,540 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=930386.6666666666, ans=0.1 2023-11-20 03:55:33,893 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.05 vs. limit=6.0 2023-11-20 03:55:37,344 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=930386.6666666666, ans=0.0 2023-11-20 03:55:43,337 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=930453.3333333334, ans=0.1 2023-11-20 03:55:56,589 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 5.831e+01 8.147e+01 8.750e+01 9.433e+01 1.159e+02, threshold=1.750e+02, percent-clipped=0.0 2023-11-20 03:56:22,045 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 139600 2023-11-20 03:56:35,316 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 7350, loss[loss=0.06336, simple_loss=0.0718, pruned_loss=0.01393, audio_tagging_loss=0.01353, over 14698.00 frames. ], tot_loss[loss=0.08047, simple_loss=0.1, pruned_loss=0.02015, audio_tagging_loss=0.0103, over 3032862.83 frames. ], batch size: 56, lr: 5.74e-03, grad_scale: 16.0 2023-11-20 03:56:39,901 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=930720.0, ans=0.125 2023-11-20 03:57:00,735 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=930853.3333333334, ans=0.125 2023-11-20 03:57:26,981 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 139650 2023-11-20 03:57:37,349 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=930986.6666666666, ans=0.125 2023-11-20 03:57:38,588 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=931053.3333333334, ans=0.2 2023-11-20 03:57:39,593 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 7400, loss[loss=0.08025, simple_loss=0.1085, pruned_loss=0.02013, audio_tagging_loss=0.005888, over 16551.00 frames. ], tot_loss[loss=0.08082, simple_loss=0.1008, pruned_loss=0.02022, audio_tagging_loss=0.01021, over 3031960.78 frames. ], batch size: 62, lr: 5.74e-03, grad_scale: 16.0 2023-11-20 03:57:41,219 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=931053.3333333334, ans=0.0 2023-11-20 03:57:42,296 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=931053.3333333334, ans=0.0 2023-11-20 03:57:45,891 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=931053.3333333334, ans=0.0 2023-11-20 03:57:45,978 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=931053.3333333334, ans=0.0 2023-11-20 03:58:05,074 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.423e+01 7.809e+01 8.517e+01 9.487e+01 1.228e+02, threshold=1.703e+02, percent-clipped=0.0 2023-11-20 03:58:25,285 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=931253.3333333334, ans=0.1 2023-11-20 03:58:30,989 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 139700 2023-11-20 03:58:33,597 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=931320.0, ans=0.05 2023-11-20 03:58:44,232 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 7450, loss[loss=0.07871, simple_loss=0.09265, pruned_loss=0.01959, audio_tagging_loss=0.01278, over 15345.00 frames. ], tot_loss[loss=0.08123, simple_loss=0.101, pruned_loss=0.02055, audio_tagging_loss=0.0102, over 3037915.22 frames. ], batch size: 57, lr: 5.74e-03, grad_scale: 16.0 2023-11-20 03:58:55,714 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=931453.3333333334, ans=0.07 2023-11-20 03:59:16,421 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=931520.0, ans=0.0 2023-11-20 03:59:19,272 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.22 vs. limit=15.0 2023-11-20 03:59:21,781 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.50 vs. limit=15.0 2023-11-20 03:59:35,535 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 139750 2023-11-20 03:59:36,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=931653.3333333334, ans=0.125 2023-11-20 03:59:47,770 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 7500, loss[loss=0.08701, simple_loss=0.1159, pruned_loss=0.02155, audio_tagging_loss=0.007518, over 14660.00 frames. ], tot_loss[loss=0.08111, simple_loss=0.1007, pruned_loss=0.02062, audio_tagging_loss=0.01015, over 3037200.39 frames. ], batch size: 56, lr: 5.74e-03, grad_scale: 16.0 2023-11-20 03:59:49,768 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=931720.0, ans=0.0 2023-11-20 03:59:55,482 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=931720.0, ans=0.0 2023-11-20 04:00:11,748 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.21 vs. limit=15.0 2023-11-20 04:00:13,506 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.250e+01 8.176e+01 8.772e+01 9.671e+01 2.176e+02, threshold=1.754e+02, percent-clipped=1.0 2023-11-20 04:00:29,594 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=931920.0, ans=0.0 2023-11-20 04:00:39,802 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 139800 2023-11-20 04:00:41,584 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=931986.6666666666, ans=0.125 2023-11-20 04:00:52,979 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 7550, loss[loss=0.08186, simple_loss=0.09741, pruned_loss=0.02344, audio_tagging_loss=0.009712, over 14728.00 frames. ], tot_loss[loss=0.08084, simple_loss=0.1005, pruned_loss=0.02048, audio_tagging_loss=0.01013, over 3040488.90 frames. ], batch size: 56, lr: 5.73e-03, grad_scale: 16.0 2023-11-20 04:01:04,957 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=932120.0, ans=0.1 2023-11-20 04:01:07,508 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=932120.0, ans=0.1 2023-11-20 04:01:08,530 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 04:01:19,638 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.04 vs. limit=10.0 2023-11-20 04:01:31,395 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.33 vs. limit=10.0 2023-11-20 04:01:44,729 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 139850 2023-11-20 04:01:51,204 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=932320.0, ans=0.09899494936611666 2023-11-20 04:01:57,660 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 7600, loss[loss=0.07705, simple_loss=0.09631, pruned_loss=0.01765, audio_tagging_loss=0.01124, over 16233.00 frames. ], tot_loss[loss=0.0817, simple_loss=0.1017, pruned_loss=0.02073, audio_tagging_loss=0.0101, over 3047398.79 frames. ], batch size: 60, lr: 5.73e-03, grad_scale: 32.0 2023-11-20 04:02:00,211 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=932386.6666666666, ans=0.0 2023-11-20 04:02:05,045 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.52 vs. limit=15.0 2023-11-20 04:02:13,442 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=932453.3333333334, ans=0.125 2023-11-20 04:02:14,856 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.21 vs. limit=22.5 2023-11-20 04:02:20,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=932453.3333333334, ans=0.0 2023-11-20 04:02:23,912 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.765e+01 8.108e+01 8.753e+01 9.539e+01 1.294e+02, threshold=1.751e+02, percent-clipped=0.0 2023-11-20 04:02:34,033 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=932520.0, ans=0.1 2023-11-20 04:02:44,539 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=932586.6666666666, ans=0.2 2023-11-20 04:02:49,274 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 139900 2023-11-20 04:03:02,135 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 7650, loss[loss=0.08648, simple_loss=0.1105, pruned_loss=0.02298, audio_tagging_loss=0.008227, over 15180.00 frames. ], tot_loss[loss=0.08122, simple_loss=0.1009, pruned_loss=0.02054, audio_tagging_loss=0.01024, over 3042494.39 frames. ], batch size: 57, lr: 5.73e-03, grad_scale: 32.0 2023-11-20 04:03:14,790 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=932786.6666666666, ans=0.0 2023-11-20 04:03:19,581 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=932786.6666666666, ans=0.125 2023-11-20 04:03:37,315 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=932853.3333333334, ans=0.125 2023-11-20 04:03:40,296 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.47 vs. limit=15.0 2023-11-20 04:03:53,568 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 139950 2023-11-20 04:03:55,614 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=932986.6666666666, ans=0.125 2023-11-20 04:04:02,263 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=932986.6666666666, ans=0.0 2023-11-20 04:04:07,047 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 7700, loss[loss=0.0811, simple_loss=0.09984, pruned_loss=0.02111, audio_tagging_loss=0.01007, over 15812.00 frames. ], tot_loss[loss=0.08086, simple_loss=0.1006, pruned_loss=0.02039, audio_tagging_loss=0.01016, over 3042582.58 frames. ], batch size: 58, lr: 5.73e-03, grad_scale: 32.0 2023-11-20 04:04:21,433 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=933120.0, ans=0.125 2023-11-20 04:04:32,000 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.410e+01 8.189e+01 8.877e+01 9.506e+01 1.213e+02, threshold=1.775e+02, percent-clipped=0.0 2023-11-20 04:04:42,587 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=933186.6666666666, ans=0.125 2023-11-20 04:04:45,681 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=933253.3333333334, ans=0.125 2023-11-20 04:04:55,531 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.42 vs. limit=15.0 2023-11-20 04:04:58,471 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 140000 2023-11-20 04:05:00,074 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-140000.pt 2023-11-20 04:05:08,026 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.61 vs. limit=22.5 2023-11-20 04:05:14,177 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=933386.6666666666, ans=0.2 2023-11-20 04:05:15,101 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 7750, loss[loss=0.09142, simple_loss=0.1167, pruned_loss=0.0234, audio_tagging_loss=0.00968, over 14949.00 frames. ], tot_loss[loss=0.08096, simple_loss=0.1005, pruned_loss=0.02045, audio_tagging_loss=0.01025, over 3038466.12 frames. ], batch size: 55, lr: 5.73e-03, grad_scale: 32.0 2023-11-20 04:05:15,491 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=933386.6666666666, ans=0.0 2023-11-20 04:05:33,177 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=933453.3333333334, ans=0.125 2023-11-20 04:05:33,481 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.95 vs. limit=15.0 2023-11-20 04:05:41,139 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=933520.0, ans=0.1 2023-11-20 04:05:44,549 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.16 vs. limit=22.5 2023-11-20 04:05:59,985 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=933586.6666666666, ans=0.125 2023-11-20 04:06:06,670 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 140050 2023-11-20 04:06:19,733 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 7800, loss[loss=0.05206, simple_loss=0.06456, pruned_loss=0.01026, audio_tagging_loss=0.009524, over 14734.00 frames. ], tot_loss[loss=0.08136, simple_loss=0.1011, pruned_loss=0.02057, audio_tagging_loss=0.01024, over 3040964.16 frames. ], batch size: 56, lr: 5.73e-03, grad_scale: 16.0 2023-11-20 04:06:46,658 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.019e+01 8.169e+01 8.821e+01 9.790e+01 1.228e+02, threshold=1.764e+02, percent-clipped=0.0 2023-11-20 04:07:11,509 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 140100 2023-11-20 04:07:11,630 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=933986.6666666666, ans=0.2 2023-11-20 04:07:15,918 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=933986.6666666666, ans=10.0 2023-11-20 04:07:19,656 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=933986.6666666666, ans=0.0 2023-11-20 04:07:24,832 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 7850, loss[loss=0.1291, simple_loss=0.1494, pruned_loss=0.04622, audio_tagging_loss=0.008208, over 15941.00 frames. ], tot_loss[loss=0.08144, simple_loss=0.1013, pruned_loss=0.02053, audio_tagging_loss=0.01027, over 3045722.66 frames. ], batch size: 57, lr: 5.73e-03, grad_scale: 16.0 2023-11-20 04:07:35,471 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.56 vs. limit=15.0 2023-11-20 04:07:51,384 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=934186.6666666666, ans=0.125 2023-11-20 04:08:16,470 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 140150 2023-11-20 04:08:17,746 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=934320.0, ans=0.0 2023-11-20 04:08:22,851 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=934320.0, ans=0.125 2023-11-20 04:08:28,683 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 7900, loss[loss=0.07524, simple_loss=0.09905, pruned_loss=0.01345, audio_tagging_loss=0.01227, over 15391.00 frames. ], tot_loss[loss=0.08068, simple_loss=0.1001, pruned_loss=0.02027, audio_tagging_loss=0.01034, over 3042589.24 frames. ], batch size: 57, lr: 5.73e-03, grad_scale: 16.0 2023-11-20 04:08:36,969 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=934386.6666666666, ans=0.125 2023-11-20 04:08:50,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=934453.3333333334, ans=0.125 2023-11-20 04:08:56,539 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.338e+01 8.229e+01 9.116e+01 9.916e+01 1.318e+02, threshold=1.823e+02, percent-clipped=0.0 2023-11-20 04:09:21,245 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 140200 2023-11-20 04:09:25,450 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=934653.3333333334, ans=0.125 2023-11-20 04:09:33,518 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 7950, loss[loss=0.04863, simple_loss=0.05204, pruned_loss=0.01274, audio_tagging_loss=0.009871, over 13255.00 frames. ], tot_loss[loss=0.08136, simple_loss=0.101, pruned_loss=0.02045, audio_tagging_loss=0.01043, over 3042865.73 frames. ], batch size: 53, lr: 5.73e-03, grad_scale: 16.0 2023-11-20 04:09:44,101 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=934720.0, ans=0.125 2023-11-20 04:09:50,737 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 04:09:50,887 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=934786.6666666666, ans=0.2 2023-11-20 04:09:53,350 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=934786.6666666666, ans=0.2 2023-11-20 04:10:24,401 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 140250 2023-11-20 04:10:38,696 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 8000, loss[loss=0.08617, simple_loss=0.1095, pruned_loss=0.02392, audio_tagging_loss=0.007485, over 14761.00 frames. ], tot_loss[loss=0.0813, simple_loss=0.1006, pruned_loss=0.02053, audio_tagging_loss=0.01046, over 3033680.91 frames. ], batch size: 56, lr: 5.73e-03, grad_scale: 32.0 2023-11-20 04:11:02,365 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.94 vs. limit=6.0 2023-11-20 04:11:05,189 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.597e+01 8.205e+01 8.859e+01 1.003e+02 1.525e+02, threshold=1.772e+02, percent-clipped=0.0 2023-11-20 04:11:15,898 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.30 vs. limit=22.5 2023-11-20 04:11:19,171 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=935253.3333333334, ans=0.09899494936611666 2023-11-20 04:11:30,809 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 140300 2023-11-20 04:11:32,730 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=14.12 vs. limit=15.0 2023-11-20 04:11:35,732 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=935320.0, ans=0.125 2023-11-20 04:11:42,848 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 8050, loss[loss=0.07834, simple_loss=0.096, pruned_loss=0.01837, audio_tagging_loss=0.01197, over 14467.00 frames. ], tot_loss[loss=0.08098, simple_loss=0.1001, pruned_loss=0.02039, audio_tagging_loss=0.01056, over 3032001.33 frames. ], batch size: 56, lr: 5.72e-03, grad_scale: 32.0 2023-11-20 04:12:16,202 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.61 vs. limit=6.0 2023-11-20 04:12:25,548 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=935586.6666666666, ans=0.0 2023-11-20 04:12:28,525 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.49 vs. limit=15.0 2023-11-20 04:12:33,890 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 140350 2023-11-20 04:12:46,658 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 8100, loss[loss=0.05694, simple_loss=0.06874, pruned_loss=0.01436, audio_tagging_loss=0.008209, over 14852.00 frames. ], tot_loss[loss=0.08123, simple_loss=0.1005, pruned_loss=0.0205, audio_tagging_loss=0.01051, over 3033939.35 frames. ], batch size: 56, lr: 5.72e-03, grad_scale: 32.0 2023-11-20 04:12:46,831 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=935720.0, ans=0.125 2023-11-20 04:12:48,212 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=935720.0, ans=0.125 2023-11-20 04:12:51,069 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.09 vs. limit=15.0 2023-11-20 04:13:07,641 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.14 vs. limit=22.5 2023-11-20 04:13:08,405 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=935786.6666666666, ans=0.2 2023-11-20 04:13:14,180 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.312e+01 8.397e+01 8.892e+01 9.736e+01 1.286e+02, threshold=1.778e+02, percent-clipped=0.0 2023-11-20 04:13:35,655 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=935920.0, ans=0.1 2023-11-20 04:13:38,145 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 140400 2023-11-20 04:13:51,048 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 8150, loss[loss=0.08406, simple_loss=0.1071, pruned_loss=0.02235, audio_tagging_loss=0.008145, over 14073.00 frames. ], tot_loss[loss=0.08197, simple_loss=0.102, pruned_loss=0.02075, audio_tagging_loss=0.01024, over 3041201.52 frames. ], batch size: 54, lr: 5.72e-03, grad_scale: 32.0 2023-11-20 04:14:01,377 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=936053.3333333334, ans=0.1 2023-11-20 04:14:05,350 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.62 vs. limit=10.0 2023-11-20 04:14:06,094 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=936120.0, ans=0.0 2023-11-20 04:14:22,871 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=936186.6666666666, ans=0.125 2023-11-20 04:14:43,645 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 140450 2023-11-20 04:14:43,773 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=936320.0, ans=0.125 2023-11-20 04:14:51,979 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=936320.0, ans=0.125 2023-11-20 04:14:56,554 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 8200, loss[loss=0.06984, simple_loss=0.08685, pruned_loss=0.01666, audio_tagging_loss=0.009752, over 15425.00 frames. ], tot_loss[loss=0.0821, simple_loss=0.1022, pruned_loss=0.02086, audio_tagging_loss=0.01011, over 3044319.43 frames. ], batch size: 58, lr: 5.72e-03, grad_scale: 32.0 2023-11-20 04:14:56,830 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=936386.6666666666, ans=0.1 2023-11-20 04:14:57,841 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 04:15:23,156 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.922e+01 8.334e+01 8.915e+01 9.605e+01 1.213e+02, threshold=1.783e+02, percent-clipped=0.0 2023-11-20 04:15:30,100 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=936520.0, ans=0.2 2023-11-20 04:15:34,532 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=936586.6666666666, ans=0.1 2023-11-20 04:15:43,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=936586.6666666666, ans=0.2 2023-11-20 04:15:48,654 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 140500 2023-11-20 04:15:51,305 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=936653.3333333334, ans=0.07 2023-11-20 04:16:01,643 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 8250, loss[loss=0.07552, simple_loss=0.09578, pruned_loss=0.01835, audio_tagging_loss=0.009278, over 14115.00 frames. ], tot_loss[loss=0.08158, simple_loss=0.1017, pruned_loss=0.02074, audio_tagging_loss=0.009984, over 3037462.14 frames. ], batch size: 55, lr: 5.72e-03, grad_scale: 32.0 2023-11-20 04:16:05,724 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=936720.0, ans=0.125 2023-11-20 04:16:35,788 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.68 vs. limit=15.0 2023-11-20 04:16:52,928 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 140550 2023-11-20 04:17:04,546 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=937053.3333333334, ans=0.125 2023-11-20 04:17:05,472 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 8300, loss[loss=0.06858, simple_loss=0.07649, pruned_loss=0.01997, audio_tagging_loss=0.01037, over 14500.00 frames. ], tot_loss[loss=0.08169, simple_loss=0.102, pruned_loss=0.02069, audio_tagging_loss=0.00999, over 3039549.72 frames. ], batch size: 56, lr: 5.72e-03, grad_scale: 16.0 2023-11-20 04:17:05,701 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=937053.3333333334, ans=0.0 2023-11-20 04:17:13,880 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.50 vs. limit=22.5 2023-11-20 04:17:18,985 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=937120.0, ans=0.0 2023-11-20 04:17:33,793 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.558e+01 8.113e+01 8.924e+01 9.899e+01 1.160e+02, threshold=1.785e+02, percent-clipped=0.0 2023-11-20 04:17:44,010 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=937253.3333333334, ans=0.125 2023-11-20 04:17:47,638 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=937253.3333333334, ans=0.125 2023-11-20 04:17:56,382 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 140600 2023-11-20 04:18:09,914 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 8350, loss[loss=0.08483, simple_loss=0.1115, pruned_loss=0.02078, audio_tagging_loss=0.008299, over 15369.00 frames. ], tot_loss[loss=0.08191, simple_loss=0.1023, pruned_loss=0.02079, audio_tagging_loss=0.009943, over 3050698.87 frames. ], batch size: 57, lr: 5.72e-03, grad_scale: 16.0 2023-11-20 04:18:24,334 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=937453.3333333334, ans=0.125 2023-11-20 04:18:53,258 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=937586.6666666666, ans=0.07 2023-11-20 04:18:55,824 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=937586.6666666666, ans=0.125 2023-11-20 04:19:02,273 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 140650 2023-11-20 04:19:15,098 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 8400, loss[loss=0.0673, simple_loss=0.08123, pruned_loss=0.016, audio_tagging_loss=0.01069, over 15022.00 frames. ], tot_loss[loss=0.08055, simple_loss=0.1005, pruned_loss=0.02039, audio_tagging_loss=0.009889, over 3046494.97 frames. ], batch size: 57, lr: 5.72e-03, grad_scale: 32.0 2023-11-20 04:19:15,381 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=937720.0, ans=0.125 2023-11-20 04:19:43,611 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.796e+01 7.950e+01 8.532e+01 9.547e+01 1.131e+02, threshold=1.706e+02, percent-clipped=0.0 2023-11-20 04:20:01,618 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.56 vs. limit=15.0 2023-11-20 04:20:06,601 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=937986.6666666666, ans=15.0 2023-11-20 04:20:07,374 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 140700 2023-11-20 04:20:13,066 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.20 vs. limit=15.0 2023-11-20 04:20:19,657 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 8450, loss[loss=0.05702, simple_loss=0.06209, pruned_loss=0.01262, audio_tagging_loss=0.01336, over 14801.00 frames. ], tot_loss[loss=0.07998, simple_loss=0.09948, pruned_loss=0.0202, audio_tagging_loss=0.01004, over 3048314.11 frames. ], batch size: 57, lr: 5.72e-03, grad_scale: 32.0 2023-11-20 04:20:36,997 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=938120.0, ans=0.2 2023-11-20 04:20:48,335 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=938186.6666666666, ans=0.0 2023-11-20 04:20:54,314 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=938186.6666666666, ans=0.0 2023-11-20 04:21:12,266 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 140750 2023-11-20 04:21:19,069 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=938320.0, ans=0.0 2023-11-20 04:21:19,159 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=938320.0, ans=0.0 2023-11-20 04:21:20,635 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.71 vs. limit=22.5 2023-11-20 04:21:24,931 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 8500, loss[loss=0.06592, simple_loss=0.07782, pruned_loss=0.0186, audio_tagging_loss=0.008409, over 15628.00 frames. ], tot_loss[loss=0.07988, simple_loss=0.09923, pruned_loss=0.02016, audio_tagging_loss=0.01011, over 3052630.59 frames. ], batch size: 58, lr: 5.72e-03, grad_scale: 32.0 2023-11-20 04:21:26,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=938386.6666666666, ans=0.125 2023-11-20 04:21:42,607 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=938453.3333333334, ans=0.125 2023-11-20 04:21:43,023 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.94 vs. limit=15.0 2023-11-20 04:21:53,277 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.458e+01 8.139e+01 8.950e+01 9.720e+01 1.235e+02, threshold=1.790e+02, percent-clipped=0.0 2023-11-20 04:22:03,377 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.07 vs. limit=15.0 2023-11-20 04:22:12,036 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=938586.6666666666, ans=0.1 2023-11-20 04:22:16,803 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 140800 2023-11-20 04:22:20,025 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=938653.3333333334, ans=0.2 2023-11-20 04:22:30,072 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 8550, loss[loss=0.08419, simple_loss=0.1094, pruned_loss=0.02013, audio_tagging_loss=0.00935, over 15500.00 frames. ], tot_loss[loss=0.07993, simple_loss=0.09956, pruned_loss=0.02008, audio_tagging_loss=0.01006, over 3049948.97 frames. ], batch size: 57, lr: 5.71e-03, grad_scale: 32.0 2023-11-20 04:22:34,546 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.96 vs. limit=15.0 2023-11-20 04:22:50,649 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=938786.6666666666, ans=0.0 2023-11-20 04:23:04,738 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=938853.3333333334, ans=0.0 2023-11-20 04:23:20,966 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=938986.6666666666, ans=0.125 2023-11-20 04:23:21,939 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 140850 2023-11-20 04:23:29,438 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=938986.6666666666, ans=0.2 2023-11-20 04:23:33,641 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.00 vs. limit=15.0 2023-11-20 04:23:34,030 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 8600, loss[loss=0.1005, simple_loss=0.1237, pruned_loss=0.02667, audio_tagging_loss=0.01196, over 14984.00 frames. ], tot_loss[loss=0.07975, simple_loss=0.09897, pruned_loss=0.02002, audio_tagging_loss=0.01024, over 3053497.80 frames. ], batch size: 54, lr: 5.71e-03, grad_scale: 16.0 2023-11-20 04:23:46,759 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=939120.0, ans=0.2 2023-11-20 04:23:55,899 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=939120.0, ans=15.0 2023-11-20 04:24:04,340 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.979e+01 8.151e+01 8.812e+01 9.579e+01 1.857e+02, threshold=1.762e+02, percent-clipped=1.0 2023-11-20 04:24:09,012 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=939186.6666666666, ans=0.2 2023-11-20 04:24:09,601 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.38 vs. limit=22.5 2023-11-20 04:24:26,119 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 140900 2023-11-20 04:24:39,304 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 8650, loss[loss=0.1092, simple_loss=0.1461, pruned_loss=0.02816, audio_tagging_loss=0.007971, over 16467.00 frames. ], tot_loss[loss=0.08132, simple_loss=0.1014, pruned_loss=0.0205, audio_tagging_loss=0.01013, over 3053830.34 frames. ], batch size: 58, lr: 5.71e-03, grad_scale: 16.0 2023-11-20 04:25:25,081 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.65 vs. limit=15.0 2023-11-20 04:25:30,786 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 140950 2023-11-20 04:25:43,352 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 8700, loss[loss=0.08943, simple_loss=0.1128, pruned_loss=0.02602, audio_tagging_loss=0.00703, over 14416.00 frames. ], tot_loss[loss=0.08162, simple_loss=0.1017, pruned_loss=0.02056, audio_tagging_loss=0.01019, over 3059244.03 frames. ], batch size: 54, lr: 5.71e-03, grad_scale: 16.0 2023-11-20 04:25:50,418 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=939720.0, ans=0.0 2023-11-20 04:25:50,696 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.12 vs. limit=15.0 2023-11-20 04:26:13,597 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.908e+01 8.317e+01 9.149e+01 9.990e+01 1.361e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-20 04:26:19,264 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.55 vs. limit=22.5 2023-11-20 04:26:23,918 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=939920.0, ans=0.125 2023-11-20 04:26:35,177 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 141000 2023-11-20 04:26:38,882 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=939986.6666666666, ans=0.09899494936611666 2023-11-20 04:26:41,206 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=939986.6666666666, ans=0.0 2023-11-20 04:26:48,336 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 8750, loss[loss=0.09891, simple_loss=0.1371, pruned_loss=0.02262, audio_tagging_loss=0.007749, over 16885.00 frames. ], tot_loss[loss=0.08223, simple_loss=0.1025, pruned_loss=0.02077, audio_tagging_loss=0.01023, over 3063673.10 frames. ], batch size: 61, lr: 5.71e-03, grad_scale: 16.0 2023-11-20 04:26:54,317 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.52 vs. limit=6.0 2023-11-20 04:27:40,246 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 141050 2023-11-20 04:27:50,048 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.82 vs. limit=15.0 2023-11-20 04:27:53,795 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 8800, loss[loss=0.07389, simple_loss=0.09657, pruned_loss=0.01518, audio_tagging_loss=0.01043, over 15678.00 frames. ], tot_loss[loss=0.08145, simple_loss=0.1015, pruned_loss=0.0204, audio_tagging_loss=0.01032, over 3064621.36 frames. ], batch size: 57, lr: 5.71e-03, grad_scale: 32.0 2023-11-20 04:27:58,986 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=940386.6666666666, ans=0.0 2023-11-20 04:28:02,475 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=940386.6666666666, ans=0.125 2023-11-20 04:28:09,401 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=940453.3333333334, ans=0.0 2023-11-20 04:28:09,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=940453.3333333334, ans=0.1 2023-11-20 04:28:16,983 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.19 vs. limit=22.5 2023-11-20 04:28:22,049 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=940520.0, ans=0.0 2023-11-20 04:28:24,231 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.494e+01 8.276e+01 8.994e+01 9.890e+01 1.240e+02, threshold=1.799e+02, percent-clipped=0.0 2023-11-20 04:28:27,503 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.44 vs. limit=15.0 2023-11-20 04:28:30,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=940586.6666666666, ans=0.0 2023-11-20 04:28:31,069 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.59 vs. limit=6.0 2023-11-20 04:28:38,539 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=940586.6666666666, ans=0.125 2023-11-20 04:28:45,001 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 141100 2023-11-20 04:28:58,014 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 8850, loss[loss=0.08993, simple_loss=0.1104, pruned_loss=0.02628, audio_tagging_loss=0.00844, over 14828.00 frames. ], tot_loss[loss=0.08209, simple_loss=0.1021, pruned_loss=0.0207, audio_tagging_loss=0.01034, over 3062345.48 frames. ], batch size: 56, lr: 5.71e-03, grad_scale: 16.0 2023-11-20 04:29:02,051 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=940720.0, ans=0.125 2023-11-20 04:29:10,843 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 04:29:34,229 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=940853.3333333334, ans=0.0 2023-11-20 04:29:40,244 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=940920.0, ans=0.125 2023-11-20 04:29:49,277 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 141150 2023-11-20 04:29:54,408 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=940986.6666666666, ans=0.1 2023-11-20 04:29:58,633 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=940986.6666666666, ans=0.125 2023-11-20 04:29:59,771 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=940986.6666666666, ans=0.1 2023-11-20 04:30:01,945 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 8900, loss[loss=0.08187, simple_loss=0.09911, pruned_loss=0.02316, audio_tagging_loss=0.009161, over 15313.00 frames. ], tot_loss[loss=0.0824, simple_loss=0.1031, pruned_loss=0.02073, audio_tagging_loss=0.01011, over 3066845.49 frames. ], batch size: 58, lr: 5.71e-03, grad_scale: 8.0 2023-11-20 04:30:21,689 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=941120.0, ans=0.05 2023-11-20 04:30:27,946 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=941186.6666666666, ans=0.2 2023-11-20 04:30:28,314 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.68 vs. limit=15.0 2023-11-20 04:30:34,474 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.851e+01 8.057e+01 8.835e+01 9.972e+01 1.678e+02, threshold=1.767e+02, percent-clipped=0.0 2023-11-20 04:30:37,472 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=941186.6666666666, ans=0.2 2023-11-20 04:30:54,013 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 141200 2023-11-20 04:30:54,098 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=941320.0, ans=0.1 2023-11-20 04:31:07,525 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 8950, loss[loss=0.08577, simple_loss=0.1006, pruned_loss=0.02833, audio_tagging_loss=0.007118, over 14324.00 frames. ], tot_loss[loss=0.08213, simple_loss=0.1026, pruned_loss=0.02075, audio_tagging_loss=0.0101, over 3054175.18 frames. ], batch size: 55, lr: 5.71e-03, grad_scale: 8.0 2023-11-20 04:31:11,555 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=941386.6666666666, ans=0.1 2023-11-20 04:31:34,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=941520.0, ans=0.125 2023-11-20 04:31:58,584 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 141250 2023-11-20 04:31:59,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=941653.3333333334, ans=0.125 2023-11-20 04:32:06,071 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=941653.3333333334, ans=0.0 2023-11-20 04:32:10,714 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 9000, loss[loss=0.09481, simple_loss=0.1084, pruned_loss=0.02967, audio_tagging_loss=0.01091, over 14601.00 frames. ], tot_loss[loss=0.08213, simple_loss=0.1024, pruned_loss=0.02091, audio_tagging_loss=0.01003, over 3044537.05 frames. ], batch size: 58, lr: 5.71e-03, grad_scale: 8.0 2023-11-20 04:32:10,717 INFO [train_asr.py:1285] (0/4) Computing validation loss 2023-11-20 04:32:33,933 INFO [zipformer.py:1873] (0/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.7592, 4.2280, 3.8295, 4.1076], device='cuda:0') 2023-11-20 04:32:53,271 INFO [train_asr.py:1294] (0/4) Epoch 12, validation: loss=0.06397, simple_loss=0.05412, pruned_loss=0.005869, audio_tagging_loss=0.03104, over 4681554.00 frames. 2023-11-20 04:32:53,272 INFO [train_asr.py:1295] (0/4) Maximum memory allocated so far is 25925MB 2023-11-20 04:32:59,000 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=941720.0, ans=0.0 2023-11-20 04:33:14,563 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=941786.6666666666, ans=0.125 2023-11-20 04:33:25,868 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.562e+01 8.268e+01 8.688e+01 9.407e+01 1.162e+02, threshold=1.738e+02, percent-clipped=0.0 2023-11-20 04:33:26,288 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=941853.3333333334, ans=0.1 2023-11-20 04:33:28,732 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=941853.3333333334, ans=0.1 2023-11-20 04:33:37,461 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=941920.0, ans=0.125 2023-11-20 04:33:37,465 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=941920.0, ans=0.125 2023-11-20 04:33:45,387 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 141300 2023-11-20 04:33:48,589 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=941986.6666666666, ans=0.95 2023-11-20 04:33:58,662 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 9050, loss[loss=0.08942, simple_loss=0.1168, pruned_loss=0.02297, audio_tagging_loss=0.00803, over 16340.00 frames. ], tot_loss[loss=0.08243, simple_loss=0.103, pruned_loss=0.02088, audio_tagging_loss=0.01005, over 3048724.23 frames. ], batch size: 58, lr: 5.70e-03, grad_scale: 8.0 2023-11-20 04:33:59,085 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=942053.3333333334, ans=0.04949747468305833 2023-11-20 04:34:10,251 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.140e-01 2023-11-20 04:34:25,627 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=942186.6666666666, ans=0.0 2023-11-20 04:34:28,188 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=942186.6666666666, ans=0.1 2023-11-20 04:34:30,431 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=942186.6666666666, ans=0.0 2023-11-20 04:34:33,528 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=942186.6666666666, ans=0.125 2023-11-20 04:34:35,226 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.80 vs. limit=6.0 2023-11-20 04:34:50,678 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 141350 2023-11-20 04:35:03,401 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 9100, loss[loss=0.09469, simple_loss=0.1295, pruned_loss=0.02319, audio_tagging_loss=0.006771, over 15745.00 frames. ], tot_loss[loss=0.08257, simple_loss=0.1035, pruned_loss=0.02091, audio_tagging_loss=0.009922, over 3050015.77 frames. ], batch size: 57, lr: 5.70e-03, grad_scale: 8.0 2023-11-20 04:35:09,767 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=942386.6666666666, ans=0.125 2023-11-20 04:35:16,331 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=942453.3333333334, ans=22.5 2023-11-20 04:35:22,904 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=942453.3333333334, ans=0.0 2023-11-20 04:35:36,018 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.481e+01 8.086e+01 8.915e+01 9.526e+01 1.275e+02, threshold=1.783e+02, percent-clipped=0.0 2023-11-20 04:35:41,355 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=942586.6666666666, ans=0.125 2023-11-20 04:35:55,771 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 141400 2023-11-20 04:35:55,883 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=942653.3333333334, ans=0.0 2023-11-20 04:36:01,760 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.74 vs. limit=22.5 2023-11-20 04:36:08,588 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 9150, loss[loss=0.08876, simple_loss=0.1208, pruned_loss=0.01994, audio_tagging_loss=0.008424, over 16352.00 frames. ], tot_loss[loss=0.08242, simple_loss=0.1034, pruned_loss=0.0208, audio_tagging_loss=0.009945, over 3063671.45 frames. ], batch size: 58, lr: 5.70e-03, grad_scale: 8.0 2023-11-20 04:36:08,900 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=942720.0, ans=0.0 2023-11-20 04:36:42,463 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=942853.3333333334, ans=0.125 2023-11-20 04:37:00,711 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 141450 2023-11-20 04:37:11,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=942986.6666666666, ans=0.125 2023-11-20 04:37:14,202 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 9200, loss[loss=0.08532, simple_loss=0.09965, pruned_loss=0.0239, audio_tagging_loss=0.01159, over 15136.00 frames. ], tot_loss[loss=0.08287, simple_loss=0.1039, pruned_loss=0.02101, audio_tagging_loss=0.009901, over 3058133.39 frames. ], batch size: 55, lr: 5.70e-03, grad_scale: 16.0 2023-11-20 04:37:36,180 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=943120.0, ans=0.0 2023-11-20 04:37:45,715 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.820e+01 8.352e+01 9.147e+01 9.950e+01 1.226e+02, threshold=1.829e+02, percent-clipped=0.0 2023-11-20 04:37:47,436 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=943186.6666666666, ans=0.125 2023-11-20 04:37:52,265 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=943253.3333333334, ans=0.125 2023-11-20 04:38:00,062 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.77 vs. limit=15.0 2023-11-20 04:38:06,666 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 141500 2023-11-20 04:38:19,630 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 9250, loss[loss=0.08081, simple_loss=0.09997, pruned_loss=0.0197, audio_tagging_loss=0.01112, over 15087.00 frames. ], tot_loss[loss=0.08236, simple_loss=0.1032, pruned_loss=0.02083, audio_tagging_loss=0.009912, over 3061451.04 frames. ], batch size: 57, lr: 5.70e-03, grad_scale: 16.0 2023-11-20 04:38:26,005 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=943386.6666666666, ans=0.1 2023-11-20 04:38:27,615 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.15 vs. limit=15.0 2023-11-20 04:38:32,129 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=943453.3333333334, ans=0.125 2023-11-20 04:38:47,307 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.02 vs. limit=12.0 2023-11-20 04:39:08,943 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=943586.6666666666, ans=0.035 2023-11-20 04:39:11,447 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 141550 2023-11-20 04:39:14,234 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=943653.3333333334, ans=0.0 2023-11-20 04:39:23,866 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 9300, loss[loss=0.07691, simple_loss=0.08899, pruned_loss=0.02016, audio_tagging_loss=0.01225, over 15772.00 frames. ], tot_loss[loss=0.0824, simple_loss=0.1029, pruned_loss=0.02098, audio_tagging_loss=0.009993, over 3059409.29 frames. ], batch size: 62, lr: 5.70e-03, grad_scale: 16.0 2023-11-20 04:39:25,614 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.99 vs. limit=22.5 2023-11-20 04:39:30,134 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=943720.0, ans=0.0 2023-11-20 04:39:57,020 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.234e+01 8.210e+01 8.770e+01 9.348e+01 1.167e+02, threshold=1.754e+02, percent-clipped=0.0 2023-11-20 04:40:11,294 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.88 vs. limit=12.0 2023-11-20 04:40:15,581 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 141600 2023-11-20 04:40:23,965 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=943986.6666666666, ans=0.125 2023-11-20 04:40:25,421 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=943986.6666666666, ans=0.125 2023-11-20 04:40:28,819 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 9350, loss[loss=0.09277, simple_loss=0.1122, pruned_loss=0.02661, audio_tagging_loss=0.01008, over 15951.00 frames. ], tot_loss[loss=0.08246, simple_loss=0.1026, pruned_loss=0.02105, audio_tagging_loss=0.0101, over 3053061.81 frames. ], batch size: 59, lr: 5.70e-03, grad_scale: 16.0 2023-11-20 04:40:31,505 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.46 vs. limit=10.0 2023-11-20 04:40:39,969 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=944053.3333333334, ans=0.125 2023-11-20 04:40:46,642 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=944120.0, ans=0.2 2023-11-20 04:40:54,133 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=944186.6666666666, ans=0.125 2023-11-20 04:41:14,319 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=944253.3333333334, ans=0.125 2023-11-20 04:41:21,528 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 141650 2023-11-20 04:41:30,441 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=944320.0, ans=0.125 2023-11-20 04:41:33,797 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 9400, loss[loss=0.09912, simple_loss=0.1105, pruned_loss=0.03206, audio_tagging_loss=0.01183, over 15511.00 frames. ], tot_loss[loss=0.08168, simple_loss=0.1013, pruned_loss=0.02074, audio_tagging_loss=0.01027, over 3048149.32 frames. ], batch size: 57, lr: 5.70e-03, grad_scale: 16.0 2023-11-20 04:41:58,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=944520.0, ans=0.2 2023-11-20 04:42:04,833 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=944520.0, ans=0.0 2023-11-20 04:42:05,743 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.961e+01 8.348e+01 8.869e+01 9.935e+01 1.327e+02, threshold=1.774e+02, percent-clipped=0.0 2023-11-20 04:42:22,079 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=944586.6666666666, ans=0.125 2023-11-20 04:42:26,298 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 141700 2023-11-20 04:42:37,252 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 04:42:38,430 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 9450, loss[loss=0.05661, simple_loss=0.06562, pruned_loss=0.01387, audio_tagging_loss=0.009929, over 15735.00 frames. ], tot_loss[loss=0.08162, simple_loss=0.1014, pruned_loss=0.02061, audio_tagging_loss=0.01033, over 3049145.57 frames. ], batch size: 62, lr: 5.70e-03, grad_scale: 16.0 2023-11-20 04:42:44,391 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.63 vs. limit=15.0 2023-11-20 04:43:02,956 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=944786.6666666666, ans=0.125 2023-11-20 04:43:18,192 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=944920.0, ans=0.1 2023-11-20 04:43:18,225 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=944920.0, ans=0.125 2023-11-20 04:43:30,231 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 141750 2023-11-20 04:43:42,768 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 9500, loss[loss=0.0881, simple_loss=0.1061, pruned_loss=0.02531, audio_tagging_loss=0.009743, over 13925.00 frames. ], tot_loss[loss=0.08131, simple_loss=0.1011, pruned_loss=0.02041, audio_tagging_loss=0.01036, over 3046263.68 frames. ], batch size: 54, lr: 5.70e-03, grad_scale: 16.0 2023-11-20 04:43:47,425 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=945053.3333333334, ans=0.1 2023-11-20 04:43:55,438 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=945120.0, ans=0.04949747468305833 2023-11-20 04:44:07,497 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.67 vs. limit=15.0 2023-11-20 04:44:15,582 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.845e+01 8.327e+01 9.041e+01 9.892e+01 1.668e+02, threshold=1.808e+02, percent-clipped=0.0 2023-11-20 04:44:35,002 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 141800 2023-11-20 04:44:35,434 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.26 vs. limit=22.5 2023-11-20 04:44:38,473 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=945320.0, ans=0.1 2023-11-20 04:44:38,502 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=945320.0, ans=0.125 2023-11-20 04:44:48,686 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 9550, loss[loss=0.0666, simple_loss=0.07655, pruned_loss=0.01389, audio_tagging_loss=0.01443, over 14478.00 frames. ], tot_loss[loss=0.08303, simple_loss=0.1033, pruned_loss=0.02103, audio_tagging_loss=0.01035, over 3043706.96 frames. ], batch size: 55, lr: 5.69e-03, grad_scale: 16.0 2023-11-20 04:44:51,438 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=945386.6666666666, ans=0.125 2023-11-20 04:45:09,577 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=945453.3333333334, ans=0.2 2023-11-20 04:45:35,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=945586.6666666666, ans=0.2 2023-11-20 04:45:36,097 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=945586.6666666666, ans=0.125 2023-11-20 04:45:38,591 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.923e-02 2023-11-20 04:45:40,902 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 141850 2023-11-20 04:45:48,109 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=945653.3333333334, ans=0.125 2023-11-20 04:45:54,289 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 9600, loss[loss=0.09963, simple_loss=0.1346, pruned_loss=0.02633, audio_tagging_loss=0.005995, over 15652.00 frames. ], tot_loss[loss=0.0831, simple_loss=0.1033, pruned_loss=0.02101, audio_tagging_loss=0.01045, over 3045935.15 frames. ], batch size: 55, lr: 5.69e-03, grad_scale: 32.0 2023-11-20 04:46:08,425 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=945786.6666666666, ans=0.0 2023-11-20 04:46:20,651 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=945853.3333333334, ans=0.0 2023-11-20 04:46:24,462 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=945853.3333333334, ans=0.125 2023-11-20 04:46:26,479 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.916e+01 8.231e+01 8.901e+01 9.790e+01 1.400e+02, threshold=1.780e+02, percent-clipped=0.0 2023-11-20 04:46:31,085 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=945853.3333333334, ans=0.09899494936611666 2023-11-20 04:46:37,141 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.01 vs. limit=22.5 2023-11-20 04:46:39,548 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.52 vs. limit=22.5 2023-11-20 04:46:45,565 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.00 vs. limit=15.0 2023-11-20 04:46:46,229 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 141900 2023-11-20 04:46:47,677 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=945986.6666666666, ans=0.125 2023-11-20 04:46:58,335 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 9650, loss[loss=0.09899, simple_loss=0.1294, pruned_loss=0.02518, audio_tagging_loss=0.009108, over 14854.00 frames. ], tot_loss[loss=0.0829, simple_loss=0.1029, pruned_loss=0.02096, audio_tagging_loss=0.01049, over 3039281.06 frames. ], batch size: 53, lr: 5.69e-03, grad_scale: 16.0 2023-11-20 04:47:00,678 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.50 vs. limit=15.0 2023-11-20 04:47:01,982 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.71 vs. limit=15.0 2023-11-20 04:47:10,221 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.76 vs. limit=15.0 2023-11-20 04:47:12,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=946120.0, ans=0.125 2023-11-20 04:47:32,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=946186.6666666666, ans=0.125 2023-11-20 04:47:50,333 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 141950 2023-11-20 04:48:03,360 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 9700, loss[loss=0.06801, simple_loss=0.08125, pruned_loss=0.01567, audio_tagging_loss=0.01171, over 15900.00 frames. ], tot_loss[loss=0.083, simple_loss=0.1034, pruned_loss=0.02102, audio_tagging_loss=0.01028, over 3042582.71 frames. ], batch size: 62, lr: 5.69e-03, grad_scale: 16.0 2023-11-20 04:48:08,685 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.67 vs. limit=15.0 2023-11-20 04:48:19,218 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.whiten.whitening_limit, batch_count=946453.3333333334, ans=12.0 2023-11-20 04:48:28,794 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=946520.0, ans=0.125 2023-11-20 04:48:34,880 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=946520.0, ans=0.125 2023-11-20 04:48:36,941 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.885e+01 8.155e+01 8.941e+01 9.505e+01 1.207e+02, threshold=1.788e+02, percent-clipped=0.0 2023-11-20 04:48:42,986 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=946586.6666666666, ans=0.125 2023-11-20 04:48:55,408 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 142000 2023-11-20 04:49:01,430 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=946653.3333333334, ans=0.2 2023-11-20 04:49:08,454 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 9750, loss[loss=0.08449, simple_loss=0.1078, pruned_loss=0.02362, audio_tagging_loss=0.006973, over 14398.00 frames. ], tot_loss[loss=0.08238, simple_loss=0.103, pruned_loss=0.02074, audio_tagging_loss=0.01014, over 3041713.38 frames. ], batch size: 55, lr: 5.69e-03, grad_scale: 16.0 2023-11-20 04:49:11,411 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.44 vs. limit=10.0 2023-11-20 04:49:29,113 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_na.min_abs, batch_count=946786.6666666666, ans=0.02 2023-11-20 04:49:45,887 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=946920.0, ans=0.125 2023-11-20 04:49:48,939 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=946920.0, ans=0.125 2023-11-20 04:50:00,588 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 142050 2023-11-20 04:50:12,902 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 9800, loss[loss=0.08223, simple_loss=0.1073, pruned_loss=0.01866, audio_tagging_loss=0.0099, over 15409.00 frames. ], tot_loss[loss=0.08193, simple_loss=0.1022, pruned_loss=0.02066, audio_tagging_loss=0.01015, over 3040997.25 frames. ], batch size: 57, lr: 5.69e-03, grad_scale: 16.0 2023-11-20 04:50:16,313 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.10 vs. limit=22.5 2023-11-20 04:50:20,538 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.71 vs. limit=15.0 2023-11-20 04:50:33,560 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.24 vs. limit=6.0 2023-11-20 04:50:46,597 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=947186.6666666666, ans=0.2 2023-11-20 04:50:47,335 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.379e+01 8.314e+01 8.707e+01 9.702e+01 1.155e+02, threshold=1.741e+02, percent-clipped=0.0 2023-11-20 04:51:04,555 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 142100 2023-11-20 04:51:08,903 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=947320.0, ans=0.125 2023-11-20 04:51:10,166 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=947320.0, ans=0.125 2023-11-20 04:51:11,782 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 04:51:17,960 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 9850, loss[loss=0.04425, simple_loss=0.0515, pruned_loss=0.006937, audio_tagging_loss=0.01157, over 15127.00 frames. ], tot_loss[loss=0.0813, simple_loss=0.1016, pruned_loss=0.02044, audio_tagging_loss=0.01008, over 3044606.09 frames. ], batch size: 59, lr: 5.69e-03, grad_scale: 16.0 2023-11-20 04:51:43,894 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=947520.0, ans=0.04949747468305833 2023-11-20 04:51:53,854 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=947520.0, ans=0.125 2023-11-20 04:52:09,666 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 142150 2023-11-20 04:52:13,447 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=947653.3333333334, ans=0.125 2023-11-20 04:52:16,560 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=947653.3333333334, ans=0.125 2023-11-20 04:52:17,944 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=947653.3333333334, ans=0.0 2023-11-20 04:52:22,403 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 9900, loss[loss=0.0732, simple_loss=0.0859, pruned_loss=0.0165, audio_tagging_loss=0.01375, over 15116.00 frames. ], tot_loss[loss=0.08131, simple_loss=0.1015, pruned_loss=0.02043, audio_tagging_loss=0.01011, over 3035097.75 frames. ], batch size: 58, lr: 5.69e-03, grad_scale: 16.0 2023-11-20 04:52:47,427 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=947853.3333333334, ans=0.125 2023-11-20 04:52:56,051 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.295e+01 8.098e+01 8.892e+01 9.710e+01 1.368e+02, threshold=1.778e+02, percent-clipped=0.0 2023-11-20 04:52:57,885 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.28 vs. limit=12.0 2023-11-20 04:53:14,537 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 142200 2023-11-20 04:53:17,750 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=947986.6666666666, ans=0.1 2023-11-20 04:53:23,115 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.82 vs. limit=10.0 2023-11-20 04:53:27,289 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 9950, loss[loss=0.08434, simple_loss=0.0918, pruned_loss=0.02623, audio_tagging_loss=0.01221, over 14627.00 frames. ], tot_loss[loss=0.08138, simple_loss=0.1015, pruned_loss=0.02056, audio_tagging_loss=0.01009, over 3050699.64 frames. ], batch size: 59, lr: 5.69e-03, grad_scale: 16.0 2023-11-20 04:53:35,606 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.14 vs. limit=22.5 2023-11-20 04:53:47,052 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 04:53:47,160 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=1.350e-02 2023-11-20 04:54:01,356 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=948186.6666666666, ans=0.0 2023-11-20 04:54:02,430 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=948186.6666666666, ans=0.0 2023-11-20 04:54:08,025 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=948253.3333333334, ans=10.0 2023-11-20 04:54:12,086 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.68 vs. limit=15.0 2023-11-20 04:54:18,902 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 142250 2023-11-20 04:54:32,609 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 10000, loss[loss=0.0778, simple_loss=0.09793, pruned_loss=0.017, audio_tagging_loss=0.01184, over 14718.00 frames. ], tot_loss[loss=0.08181, simple_loss=0.1021, pruned_loss=0.02064, audio_tagging_loss=0.01012, over 3051054.68 frames. ], batch size: 57, lr: 5.69e-03, grad_scale: 32.0 2023-11-20 04:54:39,010 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=948386.6666666666, ans=0.0 2023-11-20 04:54:39,406 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.98 vs. limit=15.0 2023-11-20 04:54:47,451 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.46 vs. limit=22.5 2023-11-20 04:55:03,606 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=948520.0, ans=0.2 2023-11-20 04:55:05,760 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.530e+01 8.242e+01 9.183e+01 1.026e+02 1.433e+02, threshold=1.837e+02, percent-clipped=0.0 2023-11-20 04:55:09,611 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=948586.6666666666, ans=0.125 2023-11-20 04:55:16,717 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=948586.6666666666, ans=0.125 2023-11-20 04:55:24,378 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 142300 2023-11-20 04:55:34,199 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.99 vs. limit=15.0 2023-11-20 04:55:37,135 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 10050, loss[loss=0.08361, simple_loss=0.1012, pruned_loss=0.02317, audio_tagging_loss=0.009818, over 14396.00 frames. ], tot_loss[loss=0.08202, simple_loss=0.1022, pruned_loss=0.02077, audio_tagging_loss=0.01013, over 3048572.40 frames. ], batch size: 57, lr: 5.68e-03, grad_scale: 32.0 2023-11-20 04:55:40,983 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=948720.0, ans=0.1 2023-11-20 04:55:43,543 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=948720.0, ans=0.125 2023-11-20 04:55:46,660 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=948720.0, ans=0.2 2023-11-20 04:55:55,301 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.86 vs. limit=15.0 2023-11-20 04:56:04,332 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.35 vs. limit=15.0 2023-11-20 04:56:15,690 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=948920.0, ans=0.2 2023-11-20 04:56:28,234 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 142350 2023-11-20 04:56:37,648 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=948986.6666666666, ans=0.07 2023-11-20 04:56:41,026 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 10100, loss[loss=0.1106, simple_loss=0.1456, pruned_loss=0.02888, audio_tagging_loss=0.008985, over 15238.00 frames. ], tot_loss[loss=0.08177, simple_loss=0.1017, pruned_loss=0.02071, audio_tagging_loss=0.01022, over 3044556.93 frames. ], batch size: 56, lr: 5.68e-03, grad_scale: 16.0 2023-11-20 04:56:51,685 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=949053.3333333334, ans=0.125 2023-11-20 04:56:56,212 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=949120.0, ans=0.1 2023-11-20 04:56:59,977 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=949120.0, ans=0.0 2023-11-20 04:57:16,166 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.858e+01 8.344e+01 8.796e+01 9.668e+01 1.145e+02, threshold=1.759e+02, percent-clipped=0.0 2023-11-20 04:57:29,421 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=949253.3333333334, ans=0.1 2023-11-20 04:57:32,850 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 04:57:32,887 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 142400 2023-11-20 04:57:38,277 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=949320.0, ans=0.0 2023-11-20 04:57:42,588 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=949320.0, ans=0.125 2023-11-20 04:57:46,705 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 10150, loss[loss=0.08394, simple_loss=0.1056, pruned_loss=0.01956, audio_tagging_loss=0.01157, over 15395.00 frames. ], tot_loss[loss=0.08222, simple_loss=0.1023, pruned_loss=0.02089, audio_tagging_loss=0.01019, over 3051384.17 frames. ], batch size: 56, lr: 5.68e-03, grad_scale: 16.0 2023-11-20 04:58:07,206 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=949453.3333333334, ans=0.025 2023-11-20 04:58:16,881 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 04:58:21,562 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=949520.0, ans=0.2 2023-11-20 04:58:28,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=949586.6666666666, ans=0.125 2023-11-20 04:58:38,510 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 142450 2023-11-20 04:58:44,875 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=949653.3333333334, ans=0.125 2023-11-20 04:58:51,113 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 10200, loss[loss=0.09755, simple_loss=0.1126, pruned_loss=0.03183, audio_tagging_loss=0.009434, over 14734.00 frames. ], tot_loss[loss=0.08232, simple_loss=0.1023, pruned_loss=0.02089, audio_tagging_loss=0.01027, over 3048264.10 frames. ], batch size: 57, lr: 5.68e-03, grad_scale: 16.0 2023-11-20 04:58:54,160 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.02 vs. limit=15.0 2023-11-20 04:59:10,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=949786.6666666666, ans=0.125 2023-11-20 04:59:10,933 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=949786.6666666666, ans=0.0 2023-11-20 04:59:11,391 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.54 vs. limit=22.5 2023-11-20 04:59:13,467 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=949786.6666666666, ans=0.1 2023-11-20 04:59:15,459 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 04:59:24,231 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=949853.3333333334, ans=0.125 2023-11-20 04:59:26,223 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.455e+01 8.468e+01 8.830e+01 9.466e+01 1.234e+02, threshold=1.766e+02, percent-clipped=0.0 2023-11-20 04:59:35,075 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=949920.0, ans=0.125 2023-11-20 04:59:41,717 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=949986.6666666666, ans=0.1 2023-11-20 04:59:42,868 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 142500 2023-11-20 04:59:44,220 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=949986.6666666666, ans=0.125 2023-11-20 04:59:54,084 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=950053.3333333334, ans=0.125 2023-11-20 04:59:54,935 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 10250, loss[loss=0.08763, simple_loss=0.1144, pruned_loss=0.02141, audio_tagging_loss=0.009033, over 15449.00 frames. ], tot_loss[loss=0.08176, simple_loss=0.1014, pruned_loss=0.02058, audio_tagging_loss=0.01049, over 3051047.68 frames. ], batch size: 58, lr: 5.68e-03, grad_scale: 16.0 2023-11-20 05:00:03,276 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=950053.3333333334, ans=0.125 2023-11-20 05:00:31,288 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.19 vs. limit=15.0 2023-11-20 05:00:46,645 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 142550 2023-11-20 05:01:00,097 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 10300, loss[loss=0.07943, simple_loss=0.09842, pruned_loss=0.01897, audio_tagging_loss=0.01125, over 15702.00 frames. ], tot_loss[loss=0.08143, simple_loss=0.101, pruned_loss=0.02039, audio_tagging_loss=0.01054, over 3048305.95 frames. ], batch size: 59, lr: 5.68e-03, grad_scale: 16.0 2023-11-20 05:01:03,456 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=950386.6666666666, ans=0.125 2023-11-20 05:01:22,284 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=950453.3333333334, ans=0.1 2023-11-20 05:01:27,189 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=950520.0, ans=0.0 2023-11-20 05:01:34,322 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.811e+01 8.101e+01 8.687e+01 9.429e+01 1.201e+02, threshold=1.737e+02, percent-clipped=0.0 2023-11-20 05:01:39,587 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=950586.6666666666, ans=0.125 2023-11-20 05:01:43,881 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=950586.6666666666, ans=0.125 2023-11-20 05:01:51,366 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=950653.3333333334, ans=0.0 2023-11-20 05:01:52,452 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 142600 2023-11-20 05:02:02,942 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.24 vs. limit=15.0 2023-11-20 05:02:04,869 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 10350, loss[loss=0.09849, simple_loss=0.1178, pruned_loss=0.02847, audio_tagging_loss=0.0111, over 16580.00 frames. ], tot_loss[loss=0.08183, simple_loss=0.1016, pruned_loss=0.02055, audio_tagging_loss=0.01048, over 3053382.52 frames. ], batch size: 60, lr: 5.68e-03, grad_scale: 16.0 2023-11-20 05:02:15,592 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=950720.0, ans=0.0 2023-11-20 05:02:57,481 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 142650 2023-11-20 05:03:09,691 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 10400, loss[loss=0.06128, simple_loss=0.0683, pruned_loss=0.01225, audio_tagging_loss=0.01488, over 14248.00 frames. ], tot_loss[loss=0.08143, simple_loss=0.1007, pruned_loss=0.0204, audio_tagging_loss=0.01068, over 3052597.46 frames. ], batch size: 54, lr: 5.68e-03, grad_scale: 32.0 2023-11-20 05:03:22,408 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=951120.0, ans=0.125 2023-11-20 05:03:30,543 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=951120.0, ans=0.0 2023-11-20 05:03:45,414 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.768e+01 8.746e+01 9.185e+01 9.994e+01 1.378e+02, threshold=1.837e+02, percent-clipped=0.0 2023-11-20 05:03:50,799 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=951253.3333333334, ans=0.025 2023-11-20 05:03:54,493 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=951253.3333333334, ans=0.5 2023-11-20 05:04:01,618 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 142700 2023-11-20 05:04:14,405 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 10450, loss[loss=0.06166, simple_loss=0.08564, pruned_loss=0.01243, audio_tagging_loss=0.006413, over 15440.00 frames. ], tot_loss[loss=0.08107, simple_loss=0.1004, pruned_loss=0.02036, audio_tagging_loss=0.01054, over 3049784.11 frames. ], batch size: 56, lr: 5.68e-03, grad_scale: 32.0 2023-11-20 05:04:21,394 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=951386.6666666666, ans=0.0 2023-11-20 05:04:26,230 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=951453.3333333334, ans=0.1 2023-11-20 05:04:26,700 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.58 vs. limit=12.0 2023-11-20 05:05:06,121 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 142750 2023-11-20 05:05:09,279 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=951653.3333333334, ans=0.0 2023-11-20 05:05:14,575 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.08 vs. limit=15.0 2023-11-20 05:05:18,699 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 10500, loss[loss=0.09468, simple_loss=0.112, pruned_loss=0.02933, audio_tagging_loss=0.009359, over 15521.00 frames. ], tot_loss[loss=0.0811, simple_loss=0.1006, pruned_loss=0.0204, audio_tagging_loss=0.01042, over 3042906.93 frames. ], batch size: 59, lr: 5.68e-03, grad_scale: 32.0 2023-11-20 05:05:20,279 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.370e-01 2023-11-20 05:05:22,137 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 05:05:40,435 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=951786.6666666666, ans=0.125 2023-11-20 05:05:43,991 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=951853.3333333334, ans=0.04949747468305833 2023-11-20 05:05:52,957 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.818e+01 8.079e+01 8.976e+01 9.766e+01 1.332e+02, threshold=1.795e+02, percent-clipped=0.0 2023-11-20 05:05:55,154 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=951853.3333333334, ans=0.125 2023-11-20 05:06:01,181 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=951920.0, ans=0.0 2023-11-20 05:06:08,845 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.57 vs. limit=22.5 2023-11-20 05:06:10,136 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 142800 2023-11-20 05:06:22,642 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 10550, loss[loss=0.08751, simple_loss=0.1025, pruned_loss=0.02563, audio_tagging_loss=0.01062, over 14432.00 frames. ], tot_loss[loss=0.08151, simple_loss=0.1014, pruned_loss=0.02064, audio_tagging_loss=0.01019, over 3042665.29 frames. ], batch size: 55, lr: 5.67e-03, grad_scale: 32.0 2023-11-20 05:06:22,830 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=952053.3333333334, ans=0.125 2023-11-20 05:06:26,505 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=952053.3333333334, ans=0.125 2023-11-20 05:06:33,037 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.83 vs. limit=15.0 2023-11-20 05:06:33,972 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=952120.0, ans=0.125 2023-11-20 05:06:46,700 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 05:07:14,182 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 142850 2023-11-20 05:07:19,458 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=952320.0, ans=0.0 2023-11-20 05:07:20,642 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=952320.0, ans=0.125 2023-11-20 05:07:24,254 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=952320.0, ans=0.125 2023-11-20 05:07:25,354 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=952386.6666666666, ans=0.0 2023-11-20 05:07:26,290 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 10600, loss[loss=0.08487, simple_loss=0.1039, pruned_loss=0.02299, audio_tagging_loss=0.009942, over 16486.00 frames. ], tot_loss[loss=0.08132, simple_loss=0.1012, pruned_loss=0.02057, audio_tagging_loss=0.01017, over 3048490.45 frames. ], batch size: 61, lr: 5.67e-03, grad_scale: 32.0 2023-11-20 05:08:01,772 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.830e+01 8.408e+01 9.197e+01 1.017e+02 1.438e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-20 05:08:18,717 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 142900 2023-11-20 05:08:18,946 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=952653.3333333334, ans=0.125 2023-11-20 05:08:31,789 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 10650, loss[loss=0.0712, simple_loss=0.09548, pruned_loss=0.01425, audio_tagging_loss=0.00921, over 15434.00 frames. ], tot_loss[loss=0.08075, simple_loss=0.1006, pruned_loss=0.02032, audio_tagging_loss=0.01012, over 3048866.21 frames. ], batch size: 57, lr: 5.67e-03, grad_scale: 32.0 2023-11-20 05:08:37,476 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=952720.0, ans=0.125 2023-11-20 05:08:55,368 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.40 vs. limit=10.0 2023-11-20 05:09:03,589 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=952853.3333333334, ans=0.1 2023-11-20 05:09:11,940 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=952920.0, ans=0.2 2023-11-20 05:09:23,757 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 142950 2023-11-20 05:09:32,087 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=952986.6666666666, ans=0.125 2023-11-20 05:09:35,475 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=953053.3333333334, ans=0.125 2023-11-20 05:09:36,515 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 10700, loss[loss=0.07842, simple_loss=0.09801, pruned_loss=0.02076, audio_tagging_loss=0.008656, over 15113.00 frames. ], tot_loss[loss=0.08097, simple_loss=0.1012, pruned_loss=0.02039, audio_tagging_loss=0.009967, over 3043899.11 frames. ], batch size: 59, lr: 5.67e-03, grad_scale: 32.0 2023-11-20 05:10:05,161 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=953186.6666666666, ans=0.1 2023-11-20 05:10:11,505 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.004e+01 8.248e+01 8.993e+01 9.641e+01 1.206e+02, threshold=1.799e+02, percent-clipped=0.0 2023-11-20 05:10:23,488 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=953253.3333333334, ans=0.0 2023-11-20 05:10:24,858 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=953253.3333333334, ans=0.125 2023-11-20 05:10:28,374 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 143000 2023-11-20 05:10:40,874 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 10750, loss[loss=0.08618, simple_loss=0.1163, pruned_loss=0.0191, audio_tagging_loss=0.008957, over 15761.00 frames. ], tot_loss[loss=0.08087, simple_loss=0.1015, pruned_loss=0.02021, audio_tagging_loss=0.009903, over 3049033.22 frames. ], batch size: 58, lr: 5.67e-03, grad_scale: 32.0 2023-11-20 05:10:43,619 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=953386.6666666666, ans=0.0 2023-11-20 05:11:09,917 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.81 vs. limit=15.0 2023-11-20 05:11:11,916 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=953520.0, ans=0.125 2023-11-20 05:11:32,710 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 143050 2023-11-20 05:11:43,707 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=953653.3333333334, ans=0.125 2023-11-20 05:11:45,915 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 10800, loss[loss=0.06995, simple_loss=0.09256, pruned_loss=0.01349, audio_tagging_loss=0.01018, over 14978.00 frames. ], tot_loss[loss=0.08032, simple_loss=0.1007, pruned_loss=0.02006, audio_tagging_loss=0.009932, over 3042052.09 frames. ], batch size: 58, lr: 5.67e-03, grad_scale: 32.0 2023-11-20 05:11:46,509 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.14 vs. limit=15.0 2023-11-20 05:11:47,597 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=953720.0, ans=0.04949747468305833 2023-11-20 05:11:50,783 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=953720.0, ans=0.2 2023-11-20 05:12:10,768 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=953853.3333333334, ans=0.0 2023-11-20 05:12:15,663 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=953853.3333333334, ans=0.2 2023-11-20 05:12:15,751 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=953853.3333333334, ans=0.0 2023-11-20 05:12:15,760 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=953853.3333333334, ans=0.0 2023-11-20 05:12:20,215 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.594e+01 7.886e+01 8.544e+01 9.371e+01 1.667e+02, threshold=1.709e+02, percent-clipped=0.0 2023-11-20 05:12:26,556 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.78 vs. limit=15.0 2023-11-20 05:12:29,010 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.62 vs. limit=22.5 2023-11-20 05:12:36,976 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.58 vs. limit=15.0 2023-11-20 05:12:37,713 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 143100 2023-11-20 05:12:44,516 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=953986.6666666666, ans=0.0 2023-11-20 05:12:50,254 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 10850, loss[loss=0.08983, simple_loss=0.1012, pruned_loss=0.02803, audio_tagging_loss=0.01121, over 14879.00 frames. ], tot_loss[loss=0.0796, simple_loss=0.09935, pruned_loss=0.01981, audio_tagging_loss=0.01012, over 3042885.58 frames. ], batch size: 56, lr: 5.67e-03, grad_scale: 16.0 2023-11-20 05:13:00,463 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=954053.3333333334, ans=0.1 2023-11-20 05:13:00,490 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=954053.3333333334, ans=0.125 2023-11-20 05:13:28,286 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=954253.3333333334, ans=0.125 2023-11-20 05:13:41,485 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 143150 2023-11-20 05:13:42,887 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=954320.0, ans=0.125 2023-11-20 05:13:45,826 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.66 vs. limit=15.0 2023-11-20 05:13:50,140 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 05:13:53,067 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.86 vs. limit=22.5 2023-11-20 05:13:53,697 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 10900, loss[loss=0.08183, simple_loss=0.09936, pruned_loss=0.0229, audio_tagging_loss=0.009251, over 14791.00 frames. ], tot_loss[loss=0.07998, simple_loss=0.0996, pruned_loss=0.01999, audio_tagging_loss=0.01019, over 3036612.69 frames. ], batch size: 57, lr: 5.67e-03, grad_scale: 16.0 2023-11-20 05:14:04,611 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=954386.6666666666, ans=0.1 2023-11-20 05:14:13,687 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=954453.3333333334, ans=0.125 2023-11-20 05:14:18,404 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=954520.0, ans=0.0 2023-11-20 05:14:22,160 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=954520.0, ans=0.2 2023-11-20 05:14:30,501 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.839e+01 8.320e+01 9.175e+01 1.028e+02 1.481e+02, threshold=1.835e+02, percent-clipped=0.0 2023-11-20 05:14:31,355 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.11 vs. limit=15.0 2023-11-20 05:14:41,960 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=954586.6666666666, ans=0.125 2023-11-20 05:14:45,551 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 143200 2023-11-20 05:14:47,406 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=954653.3333333334, ans=0.1 2023-11-20 05:14:57,257 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 05:14:59,425 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 10950, loss[loss=0.07422, simple_loss=0.08732, pruned_loss=0.01848, audio_tagging_loss=0.01208, over 16823.00 frames. ], tot_loss[loss=0.07975, simple_loss=0.09954, pruned_loss=0.01983, audio_tagging_loss=0.01015, over 3036423.75 frames. ], batch size: 63, lr: 5.67e-03, grad_scale: 16.0 2023-11-20 05:15:05,952 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=954720.0, ans=0.125 2023-11-20 05:15:11,641 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=954786.6666666666, ans=0.125 2023-11-20 05:15:27,022 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=954853.3333333334, ans=0.125 2023-11-20 05:15:29,634 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=954853.3333333334, ans=0.09899494936611666 2023-11-20 05:15:37,032 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 05:15:38,227 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=954920.0, ans=0.125 2023-11-20 05:15:51,652 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 143250 2023-11-20 05:16:04,522 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 11000, loss[loss=0.1018, simple_loss=0.1301, pruned_loss=0.02839, audio_tagging_loss=0.008304, over 15737.00 frames. ], tot_loss[loss=0.08063, simple_loss=0.1005, pruned_loss=0.02019, audio_tagging_loss=0.0102, over 3041194.06 frames. ], batch size: 57, lr: 5.67e-03, grad_scale: 16.0 2023-11-20 05:16:06,781 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=955053.3333333334, ans=0.0 2023-11-20 05:16:10,456 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=955053.3333333334, ans=0.125 2023-11-20 05:16:15,014 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 05:16:23,937 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=955120.0, ans=0.1 2023-11-20 05:16:40,612 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.137e+01 8.224e+01 8.941e+01 1.006e+02 1.362e+02, threshold=1.788e+02, percent-clipped=0.0 2023-11-20 05:16:40,940 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=955186.6666666666, ans=0.125 2023-11-20 05:16:50,154 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=955253.3333333334, ans=0.95 2023-11-20 05:16:56,677 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 143300 2023-11-20 05:17:03,531 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.49 vs. limit=15.0 2023-11-20 05:17:08,884 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 11050, loss[loss=0.09377, simple_loss=0.1176, pruned_loss=0.02437, audio_tagging_loss=0.01063, over 14911.00 frames. ], tot_loss[loss=0.08121, simple_loss=0.101, pruned_loss=0.02043, audio_tagging_loss=0.0103, over 3048702.65 frames. ], batch size: 55, lr: 5.66e-03, grad_scale: 16.0 2023-11-20 05:17:11,620 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=955386.6666666666, ans=0.1 2023-11-20 05:17:39,753 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=955520.0, ans=0.125 2023-11-20 05:17:42,129 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=955520.0, ans=0.2 2023-11-20 05:17:44,452 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=955520.0, ans=0.125 2023-11-20 05:17:50,590 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.36 vs. limit=15.0 2023-11-20 05:17:52,496 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=955586.6666666666, ans=0.1 2023-11-20 05:18:00,898 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 143350 2023-11-20 05:18:04,763 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=955653.3333333334, ans=10.0 2023-11-20 05:18:12,989 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=955720.0, ans=0.125 2023-11-20 05:18:14,382 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 11100, loss[loss=0.08925, simple_loss=0.1108, pruned_loss=0.02362, audio_tagging_loss=0.01021, over 16014.00 frames. ], tot_loss[loss=0.08162, simple_loss=0.1015, pruned_loss=0.02048, audio_tagging_loss=0.01038, over 3053916.90 frames. ], batch size: 59, lr: 5.66e-03, grad_scale: 16.0 2023-11-20 05:18:28,851 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=955786.6666666666, ans=0.0 2023-11-20 05:18:43,099 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.48 vs. limit=10.0 2023-11-20 05:18:45,266 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=955853.3333333334, ans=0.125 2023-11-20 05:18:49,760 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.985e+01 8.290e+01 8.982e+01 9.769e+01 2.008e+02, threshold=1.796e+02, percent-clipped=1.0 2023-11-20 05:18:55,869 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=955920.0, ans=0.95 2023-11-20 05:19:00,166 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=955920.0, ans=0.125 2023-11-20 05:19:05,724 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 143400 2023-11-20 05:19:13,605 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=955986.6666666666, ans=0.0 2023-11-20 05:19:18,863 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 11150, loss[loss=0.06991, simple_loss=0.07909, pruned_loss=0.0164, audio_tagging_loss=0.01396, over 16613.00 frames. ], tot_loss[loss=0.08197, simple_loss=0.1018, pruned_loss=0.02062, audio_tagging_loss=0.01047, over 3049182.60 frames. ], batch size: 62, lr: 5.66e-03, grad_scale: 16.0 2023-11-20 05:19:29,247 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=956053.3333333334, ans=0.0 2023-11-20 05:19:34,137 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=956120.0, ans=0.1 2023-11-20 05:19:41,831 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=956120.0, ans=0.5 2023-11-20 05:19:50,931 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=956186.6666666666, ans=0.5 2023-11-20 05:19:52,695 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=956186.6666666666, ans=0.125 2023-11-20 05:19:56,420 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=956253.3333333334, ans=0.0 2023-11-20 05:19:57,475 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=956253.3333333334, ans=0.0 2023-11-20 05:20:10,313 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 143450 2023-11-20 05:20:23,238 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 11200, loss[loss=0.08421, simple_loss=0.1069, pruned_loss=0.02251, audio_tagging_loss=0.008241, over 14964.00 frames. ], tot_loss[loss=0.08147, simple_loss=0.101, pruned_loss=0.02047, audio_tagging_loss=0.01051, over 3044357.00 frames. ], batch size: 55, lr: 5.66e-03, grad_scale: 32.0 2023-11-20 05:20:32,584 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.11 vs. limit=15.0 2023-11-20 05:20:34,580 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=956453.3333333334, ans=0.125 2023-11-20 05:20:59,540 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.186e+01 8.107e+01 8.782e+01 9.452e+01 1.606e+02, threshold=1.756e+02, percent-clipped=0.0 2023-11-20 05:21:14,923 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 143500 2023-11-20 05:21:17,630 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=956653.3333333334, ans=0.1 2023-11-20 05:21:27,546 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 11250, loss[loss=0.1007, simple_loss=0.1283, pruned_loss=0.03006, audio_tagging_loss=0.006479, over 15326.00 frames. ], tot_loss[loss=0.08108, simple_loss=0.1006, pruned_loss=0.02032, audio_tagging_loss=0.01048, over 3045767.16 frames. ], batch size: 54, lr: 5.66e-03, grad_scale: 32.0 2023-11-20 05:21:39,938 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.63 vs. limit=15.0 2023-11-20 05:21:40,641 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=956786.6666666666, ans=0.0 2023-11-20 05:22:19,689 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 143550 2023-11-20 05:22:31,570 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=957053.3333333334, ans=0.125 2023-11-20 05:22:32,435 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 11300, loss[loss=0.0873, simple_loss=0.119, pruned_loss=0.01991, audio_tagging_loss=0.007914, over 16263.00 frames. ], tot_loss[loss=0.08148, simple_loss=0.1014, pruned_loss=0.02051, audio_tagging_loss=0.01026, over 3041412.55 frames. ], batch size: 59, lr: 5.66e-03, grad_scale: 16.0 2023-11-20 05:22:36,769 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=957053.3333333334, ans=0.0 2023-11-20 05:22:46,773 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=957120.0, ans=0.125 2023-11-20 05:22:49,225 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=957120.0, ans=0.125 2023-11-20 05:22:58,703 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=957186.6666666666, ans=0.0 2023-11-20 05:23:05,550 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.64 vs. limit=15.0 2023-11-20 05:23:10,353 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.444e+01 8.144e+01 8.778e+01 9.554e+01 1.373e+02, threshold=1.756e+02, percent-clipped=0.0 2023-11-20 05:23:16,910 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=957253.3333333334, ans=0.125 2023-11-20 05:23:21,955 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=957253.3333333334, ans=0.0 2023-11-20 05:23:24,889 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 143600 2023-11-20 05:23:29,369 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=957320.0, ans=15.0 2023-11-20 05:23:37,256 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 11350, loss[loss=0.06893, simple_loss=0.09115, pruned_loss=0.012, audio_tagging_loss=0.01136, over 14555.00 frames. ], tot_loss[loss=0.08164, simple_loss=0.1019, pruned_loss=0.02062, audio_tagging_loss=0.01007, over 3039230.09 frames. ], batch size: 54, lr: 5.66e-03, grad_scale: 16.0 2023-11-20 05:23:46,962 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.49 vs. limit=15.0 2023-11-20 05:23:57,128 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=957453.3333333334, ans=0.125 2023-11-20 05:24:06,420 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=957520.0, ans=0.2 2023-11-20 05:24:07,611 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=957520.0, ans=0.125 2023-11-20 05:24:15,775 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=957586.6666666666, ans=0.125 2023-11-20 05:24:18,224 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=957586.6666666666, ans=0.1 2023-11-20 05:24:29,261 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 143650 2023-11-20 05:24:42,582 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 11400, loss[loss=0.07071, simple_loss=0.09417, pruned_loss=0.01526, audio_tagging_loss=0.00837, over 15132.00 frames. ], tot_loss[loss=0.08134, simple_loss=0.1018, pruned_loss=0.02048, audio_tagging_loss=0.009951, over 3041915.85 frames. ], batch size: 58, lr: 5.66e-03, grad_scale: 16.0 2023-11-20 05:24:59,411 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=957786.6666666666, ans=0.125 2023-11-20 05:25:19,461 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.590e+01 8.041e+01 8.720e+01 9.566e+01 1.309e+02, threshold=1.744e+02, percent-clipped=0.0 2023-11-20 05:25:22,070 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.49 vs. limit=6.0 2023-11-20 05:25:34,553 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 143700 2023-11-20 05:25:35,875 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=957986.6666666666, ans=0.1 2023-11-20 05:25:42,766 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=957986.6666666666, ans=0.1 2023-11-20 05:25:46,607 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=958053.3333333334, ans=0.1 2023-11-20 05:25:47,431 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 11450, loss[loss=0.08356, simple_loss=0.1084, pruned_loss=0.02064, audio_tagging_loss=0.008739, over 15486.00 frames. ], tot_loss[loss=0.08114, simple_loss=0.1015, pruned_loss=0.02041, audio_tagging_loss=0.009967, over 3045891.50 frames. ], batch size: 55, lr: 5.66e-03, grad_scale: 16.0 2023-11-20 05:26:06,891 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=958120.0, ans=0.0 2023-11-20 05:26:28,557 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.71 vs. limit=22.5 2023-11-20 05:26:38,721 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 143750 2023-11-20 05:26:51,457 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 11500, loss[loss=0.06936, simple_loss=0.08843, pruned_loss=0.01612, audio_tagging_loss=0.009026, over 15296.00 frames. ], tot_loss[loss=0.08078, simple_loss=0.101, pruned_loss=0.02027, audio_tagging_loss=0.01002, over 3047004.93 frames. ], batch size: 58, lr: 5.66e-03, grad_scale: 16.0 2023-11-20 05:26:52,241 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.73 vs. limit=12.0 2023-11-20 05:27:04,413 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=958453.3333333334, ans=0.1 2023-11-20 05:27:12,471 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=958453.3333333334, ans=0.1 2023-11-20 05:27:29,100 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.118e+01 7.975e+01 8.821e+01 9.410e+01 1.240e+02, threshold=1.764e+02, percent-clipped=0.0 2023-11-20 05:27:42,785 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 143800 2023-11-20 05:27:55,215 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=958720.0, ans=0.1 2023-11-20 05:27:56,027 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 11550, loss[loss=0.07382, simple_loss=0.09624, pruned_loss=0.0153, audio_tagging_loss=0.0104, over 15993.00 frames. ], tot_loss[loss=0.0808, simple_loss=0.101, pruned_loss=0.0202, audio_tagging_loss=0.0101, over 3045907.31 frames. ], batch size: 59, lr: 5.65e-03, grad_scale: 16.0 2023-11-20 05:28:12,170 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=958786.6666666666, ans=0.2 2023-11-20 05:28:12,537 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.44 vs. limit=15.0 2023-11-20 05:28:13,826 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.70 vs. limit=15.0 2023-11-20 05:28:24,157 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=958853.3333333334, ans=0.1 2023-11-20 05:28:36,091 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 05:28:41,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=958920.0, ans=0.125 2023-11-20 05:28:48,350 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 143850 2023-11-20 05:28:54,710 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=958986.6666666666, ans=0.125 2023-11-20 05:29:01,025 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 11600, loss[loss=0.1244, simple_loss=0.1557, pruned_loss=0.03756, audio_tagging_loss=0.009036, over 15796.00 frames. ], tot_loss[loss=0.08095, simple_loss=0.1012, pruned_loss=0.02037, audio_tagging_loss=0.009996, over 3051800.97 frames. ], batch size: 55, lr: 5.65e-03, grad_scale: 32.0 2023-11-20 05:29:19,276 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=959120.0, ans=0.2 2023-11-20 05:29:38,409 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.843e+01 8.207e+01 8.809e+01 9.519e+01 1.148e+02, threshold=1.762e+02, percent-clipped=0.0 2023-11-20 05:29:52,780 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 143900 2023-11-20 05:30:05,599 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 11650, loss[loss=0.09258, simple_loss=0.121, pruned_loss=0.02285, audio_tagging_loss=0.00921, over 16060.00 frames. ], tot_loss[loss=0.08144, simple_loss=0.102, pruned_loss=0.0204, audio_tagging_loss=0.01006, over 3055407.08 frames. ], batch size: 60, lr: 5.65e-03, grad_scale: 32.0 2023-11-20 05:30:11,137 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.16 vs. limit=22.5 2023-11-20 05:30:32,836 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=959520.0, ans=0.125 2023-11-20 05:30:57,365 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 143950 2023-11-20 05:31:09,596 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 11700, loss[loss=0.07946, simple_loss=0.09687, pruned_loss=0.02054, audio_tagging_loss=0.01049, over 14716.00 frames. ], tot_loss[loss=0.08163, simple_loss=0.1023, pruned_loss=0.02047, audio_tagging_loss=0.01002, over 3056050.37 frames. ], batch size: 58, lr: 5.65e-03, grad_scale: 32.0 2023-11-20 05:31:24,565 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=959786.6666666666, ans=0.125 2023-11-20 05:31:25,748 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=959786.6666666666, ans=0.2 2023-11-20 05:31:29,485 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=959786.6666666666, ans=0.125 2023-11-20 05:31:31,085 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.28 vs. limit=15.0 2023-11-20 05:31:31,216 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.92 vs. limit=12.0 2023-11-20 05:31:41,226 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=959853.3333333334, ans=0.0 2023-11-20 05:31:46,953 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.116e+01 8.169e+01 8.905e+01 9.543e+01 1.392e+02, threshold=1.781e+02, percent-clipped=0.0 2023-11-20 05:31:54,566 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=959920.0, ans=0.09899494936611666 2023-11-20 05:32:00,443 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 144000 2023-11-20 05:32:01,887 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-144000.pt 2023-11-20 05:32:10,563 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=959986.6666666666, ans=0.125 2023-11-20 05:32:17,065 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 11750, loss[loss=0.09975, simple_loss=0.1257, pruned_loss=0.02806, audio_tagging_loss=0.008851, over 15857.00 frames. ], tot_loss[loss=0.08198, simple_loss=0.1023, pruned_loss=0.02069, audio_tagging_loss=0.01013, over 3052782.51 frames. ], batch size: 58, lr: 5.65e-03, grad_scale: 32.0 2023-11-20 05:32:18,641 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=960053.3333333334, ans=0.125 2023-11-20 05:32:41,540 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=960186.6666666666, ans=0.2 2023-11-20 05:32:45,066 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=960186.6666666666, ans=0.1 2023-11-20 05:33:08,909 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 144050 2023-11-20 05:33:21,606 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 11800, loss[loss=0.1033, simple_loss=0.135, pruned_loss=0.02805, audio_tagging_loss=0.00772, over 15590.00 frames. ], tot_loss[loss=0.08195, simple_loss=0.1022, pruned_loss=0.02077, audio_tagging_loss=0.01007, over 3050479.66 frames. ], batch size: 59, lr: 5.65e-03, grad_scale: 16.0 2023-11-20 05:33:57,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=960520.0, ans=0.125 2023-11-20 05:33:59,969 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.293e+01 8.347e+01 8.824e+01 9.407e+01 1.316e+02, threshold=1.765e+02, percent-clipped=0.0 2023-11-20 05:34:02,826 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=960586.6666666666, ans=0.125 2023-11-20 05:34:13,017 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 144100 2023-11-20 05:34:17,340 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.89 vs. limit=10.0 2023-11-20 05:34:25,254 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 11850, loss[loss=0.08101, simple_loss=0.1074, pruned_loss=0.01894, audio_tagging_loss=0.008397, over 15826.00 frames. ], tot_loss[loss=0.08194, simple_loss=0.1022, pruned_loss=0.02075, audio_tagging_loss=0.0101, over 3046369.39 frames. ], batch size: 57, lr: 5.65e-03, grad_scale: 16.0 2023-11-20 05:34:30,468 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=960720.0, ans=0.025 2023-11-20 05:35:07,374 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.11 vs. limit=15.0 2023-11-20 05:35:16,557 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 144150 2023-11-20 05:35:27,225 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=960986.6666666666, ans=0.125 2023-11-20 05:35:29,874 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 11900, loss[loss=0.1047, simple_loss=0.1363, pruned_loss=0.02728, audio_tagging_loss=0.009324, over 17186.00 frames. ], tot_loss[loss=0.08188, simple_loss=0.1019, pruned_loss=0.02076, audio_tagging_loss=0.01018, over 3046840.19 frames. ], batch size: 61, lr: 5.65e-03, grad_scale: 16.0 2023-11-20 05:35:36,895 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=961053.3333333334, ans=0.1 2023-11-20 05:35:46,054 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=961120.0, ans=0.125 2023-11-20 05:36:07,724 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.082e+01 8.202e+01 8.808e+01 9.616e+01 1.545e+02, threshold=1.762e+02, percent-clipped=0.0 2023-11-20 05:36:21,915 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 144200 2023-11-20 05:36:27,512 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=961320.0, ans=0.0 2023-11-20 05:36:29,093 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.92 vs. limit=22.5 2023-11-20 05:36:35,100 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 11950, loss[loss=0.1028, simple_loss=0.1298, pruned_loss=0.02719, audio_tagging_loss=0.01074, over 15107.00 frames. ], tot_loss[loss=0.08139, simple_loss=0.1011, pruned_loss=0.02046, audio_tagging_loss=0.0104, over 3042272.07 frames. ], batch size: 56, lr: 5.65e-03, grad_scale: 16.0 2023-11-20 05:36:36,642 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=961386.6666666666, ans=0.125 2023-11-20 05:36:40,228 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=961386.6666666666, ans=0.05 2023-11-20 05:36:53,894 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 05:37:15,161 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=961586.6666666666, ans=0.07 2023-11-20 05:37:17,293 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=961586.6666666666, ans=0.125 2023-11-20 05:37:23,636 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.52 vs. limit=22.5 2023-11-20 05:37:25,347 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 144250 2023-11-20 05:37:37,575 INFO [train_asr.py:1262] (0/4) Epoch 12, batch 12000, loss[loss=0.07316, simple_loss=0.08123, pruned_loss=0.01959, audio_tagging_loss=0.01295, over 14212.00 frames. ], tot_loss[loss=0.08139, simple_loss=0.101, pruned_loss=0.02038, audio_tagging_loss=0.01051, over 3043773.91 frames. ], batch size: 56, lr: 5.65e-03, grad_scale: 32.0 2023-11-20 05:37:37,578 INFO [train_asr.py:1285] (0/4) Computing validation loss 2023-11-20 05:38:18,777 INFO [train_asr.py:1294] (0/4) Epoch 12, validation: loss=0.06309, simple_loss=0.0542, pruned_loss=0.005937, audio_tagging_loss=0.03005, over 4681554.00 frames. 2023-11-20 05:38:18,778 INFO [train_asr.py:1295] (0/4) Maximum memory allocated so far is 25925MB 2023-11-20 05:38:36,249 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.60 vs. limit=15.0 2023-11-20 05:38:49,327 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/epoch-12.pt 2023-11-20 05:39:27,283 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 0, loss[loss=0.07205, simple_loss=0.07252, pruned_loss=0.01429, audio_tagging_loss=0.0215, over 15841.00 frames. ], tot_loss[loss=0.07205, simple_loss=0.07252, pruned_loss=0.01429, audio_tagging_loss=0.0215, over 15841.00 frames. ], batch size: 61, lr: 5.43e-03, grad_scale: 32.0 2023-11-20 05:39:27,287 INFO [train_asr.py:1285] (0/4) Computing validation loss 2023-11-20 05:40:04,330 INFO [train_asr.py:1294] (0/4) Epoch 13, validation: loss=0.06272, simple_loss=0.05429, pruned_loss=0.006071, audio_tagging_loss=0.02951, over 4681554.00 frames. 2023-11-20 05:40:04,331 INFO [train_asr.py:1295] (0/4) Maximum memory allocated so far is 25925MB 2023-11-20 05:40:04,624 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=961886.6666666666, ans=0.125 2023-11-20 05:40:10,457 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.395e+01 8.159e+01 8.856e+01 9.666e+01 1.294e+02, threshold=1.771e+02, percent-clipped=0.0 2023-11-20 05:40:11,039 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.97 vs. limit=6.0 2023-11-20 05:40:22,063 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=961953.3333333334, ans=0.1 2023-11-20 05:40:23,191 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 144300 2023-11-20 05:40:30,131 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.44 vs. limit=15.0 2023-11-20 05:40:34,561 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=962020.0, ans=0.2 2023-11-20 05:41:09,223 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 50, loss[loss=0.07539, simple_loss=0.07492, pruned_loss=0.01721, audio_tagging_loss=0.02073, over 14795.00 frames. ], tot_loss[loss=0.09064, simple_loss=0.1007, pruned_loss=0.02037, audio_tagging_loss=0.01994, over 690719.08 frames. ], batch size: 57, lr: 5.42e-03, grad_scale: 16.0 2023-11-20 05:41:18,489 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.10 vs. limit=22.5 2023-11-20 05:41:20,527 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=962286.6666666666, ans=0.0 2023-11-20 05:41:28,977 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 144350 2023-11-20 05:41:34,016 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=962353.3333333334, ans=0.0 2023-11-20 05:41:36,857 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.72 vs. limit=10.0 2023-11-20 05:41:46,081 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.29 vs. limit=15.0 2023-11-20 05:41:46,946 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=962420.0, ans=0.025 2023-11-20 05:41:55,487 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=962420.0, ans=0.125 2023-11-20 05:42:03,008 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=962486.6666666666, ans=0.125 2023-11-20 05:42:05,314 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=962486.6666666666, ans=0.1 2023-11-20 05:42:05,371 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=962486.6666666666, ans=0.125 2023-11-20 05:42:09,194 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=962486.6666666666, ans=0.0 2023-11-20 05:42:11,830 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=962553.3333333334, ans=0.0 2023-11-20 05:42:12,677 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 100, loss[loss=0.09091, simple_loss=0.1048, pruned_loss=0.02085, audio_tagging_loss=0.01764, over 14634.00 frames. ], tot_loss[loss=0.09007, simple_loss=0.1008, pruned_loss=0.02044, audio_tagging_loss=0.01922, over 1209099.91 frames. ], batch size: 55, lr: 5.42e-03, grad_scale: 16.0 2023-11-20 05:42:18,171 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=962553.3333333334, ans=0.0 2023-11-20 05:42:21,356 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.568e+01 8.954e+01 9.518e+01 1.027e+02 1.327e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-20 05:42:22,862 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=962553.3333333334, ans=0.125 2023-11-20 05:42:33,882 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 144400 2023-11-20 05:42:44,384 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=962686.6666666666, ans=0.125 2023-11-20 05:42:54,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=962753.3333333334, ans=0.125 2023-11-20 05:42:59,375 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=962753.3333333334, ans=0.0 2023-11-20 05:43:00,553 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=962753.3333333334, ans=0.1 2023-11-20 05:43:00,765 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=962753.3333333334, ans=0.2 2023-11-20 05:43:08,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=962820.0, ans=0.2 2023-11-20 05:43:12,818 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=962820.0, ans=0.125 2023-11-20 05:43:14,238 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=962820.0, ans=0.1 2023-11-20 05:43:18,743 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 150, loss[loss=0.06764, simple_loss=0.06481, pruned_loss=0.01317, audio_tagging_loss=0.02206, over 15474.00 frames. ], tot_loss[loss=0.08861, simple_loss=0.102, pruned_loss=0.02055, audio_tagging_loss=0.01706, over 1622714.03 frames. ], batch size: 61, lr: 5.42e-03, grad_scale: 16.0 2023-11-20 05:43:26,822 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.20 vs. limit=6.0 2023-11-20 05:43:38,540 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 144450 2023-11-20 05:43:39,875 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=962953.3333333334, ans=0.125 2023-11-20 05:43:51,476 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.57 vs. limit=15.0 2023-11-20 05:44:13,504 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=963153.3333333334, ans=0.0 2023-11-20 05:44:24,352 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 200, loss[loss=0.08154, simple_loss=0.1008, pruned_loss=0.02089, audio_tagging_loss=0.01028, over 14810.00 frames. ], tot_loss[loss=0.08672, simple_loss=0.1023, pruned_loss=0.02059, audio_tagging_loss=0.01499, over 1943717.72 frames. ], batch size: 57, lr: 5.42e-03, grad_scale: 16.0 2023-11-20 05:44:28,310 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=963220.0, ans=0.025 2023-11-20 05:44:30,940 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 05:44:31,992 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.261e+01 8.317e+01 9.168e+01 9.939e+01 1.407e+02, threshold=1.834e+02, percent-clipped=0.0 2023-11-20 05:44:34,814 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=963220.0, ans=0.125 2023-11-20 05:44:43,894 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 144500 2023-11-20 05:45:21,668 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=963486.6666666666, ans=0.125 2023-11-20 05:45:26,734 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=963486.6666666666, ans=10.0 2023-11-20 05:45:28,092 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=963553.3333333334, ans=0.125 2023-11-20 05:45:28,837 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 250, loss[loss=0.08862, simple_loss=0.1151, pruned_loss=0.02096, audio_tagging_loss=0.0101, over 14696.00 frames. ], tot_loss[loss=0.0846, simple_loss=0.1014, pruned_loss=0.02022, audio_tagging_loss=0.01365, over 2179328.99 frames. ], batch size: 55, lr: 5.42e-03, grad_scale: 16.0 2023-11-20 05:45:33,420 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=963553.3333333334, ans=0.125 2023-11-20 05:45:49,346 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 144550 2023-11-20 05:45:57,612 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=963686.6666666666, ans=0.125 2023-11-20 05:45:57,800 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=963686.6666666666, ans=0.025 2023-11-20 05:46:01,359 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=963686.6666666666, ans=0.0 2023-11-20 05:46:05,178 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=963686.6666666666, ans=0.0 2023-11-20 05:46:08,894 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=963753.3333333334, ans=0.125 2023-11-20 05:46:09,498 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.79 vs. limit=15.0 2023-11-20 05:46:20,171 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=963820.0, ans=0.125 2023-11-20 05:46:34,422 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 300, loss[loss=0.06766, simple_loss=0.08509, pruned_loss=0.01641, audio_tagging_loss=0.0087, over 14705.00 frames. ], tot_loss[loss=0.08409, simple_loss=0.1021, pruned_loss=0.02051, audio_tagging_loss=0.01253, over 2369874.36 frames. ], batch size: 55, lr: 5.42e-03, grad_scale: 16.0 2023-11-20 05:46:42,580 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.680e+01 8.488e+01 9.150e+01 9.824e+01 1.478e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-20 05:46:47,237 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=963953.3333333334, ans=0.125 2023-11-20 05:46:48,806 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.35 vs. limit=15.0 2023-11-20 05:46:54,374 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 144600 2023-11-20 05:47:23,079 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=964086.6666666666, ans=0.05 2023-11-20 05:47:40,320 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 350, loss[loss=0.07184, simple_loss=0.08777, pruned_loss=0.01971, audio_tagging_loss=0.008248, over 14505.00 frames. ], tot_loss[loss=0.08273, simple_loss=0.1008, pruned_loss=0.02038, audio_tagging_loss=0.01196, over 2526753.08 frames. ], batch size: 55, lr: 5.42e-03, grad_scale: 16.0 2023-11-20 05:47:43,185 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=964220.0, ans=0.125 2023-11-20 05:47:46,973 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.27 vs. limit=15.0 2023-11-20 05:47:47,831 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=964220.0, ans=0.125 2023-11-20 05:47:58,747 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 144650 2023-11-20 05:48:12,414 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=964353.3333333334, ans=0.2 2023-11-20 05:48:19,608 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.11 vs. limit=22.5 2023-11-20 05:48:21,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=964420.0, ans=0.035 2023-11-20 05:48:42,315 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=964486.6666666666, ans=0.0 2023-11-20 05:48:43,684 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=964553.3333333334, ans=0.0 2023-11-20 05:48:44,438 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 400, loss[loss=0.08281, simple_loss=0.1039, pruned_loss=0.02124, audio_tagging_loss=0.009649, over 15058.00 frames. ], tot_loss[loss=0.08184, simple_loss=0.1006, pruned_loss=0.02011, audio_tagging_loss=0.01143, over 2646742.26 frames. ], batch size: 57, lr: 5.42e-03, grad_scale: 32.0 2023-11-20 05:48:48,480 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=964553.3333333334, ans=0.1 2023-11-20 05:48:48,530 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=964553.3333333334, ans=0.0 2023-11-20 05:48:52,438 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.722e+01 8.150e+01 8.876e+01 9.638e+01 1.255e+02, threshold=1.775e+02, percent-clipped=0.0 2023-11-20 05:48:52,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=964553.3333333334, ans=0.035 2023-11-20 05:49:04,179 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 144700 2023-11-20 05:49:19,480 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=964686.6666666666, ans=0.5 2023-11-20 05:49:21,747 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=964686.6666666666, ans=0.0 2023-11-20 05:49:26,816 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=964753.3333333334, ans=0.0 2023-11-20 05:49:28,428 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=964753.3333333334, ans=15.0 2023-11-20 05:49:37,530 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=964820.0, ans=0.2 2023-11-20 05:49:37,696 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=964820.0, ans=0.0 2023-11-20 05:49:49,694 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 450, loss[loss=0.1048, simple_loss=0.1342, pruned_loss=0.02743, audio_tagging_loss=0.01025, over 15002.00 frames. ], tot_loss[loss=0.08113, simple_loss=0.1001, pruned_loss=0.01994, audio_tagging_loss=0.01115, over 2734208.16 frames. ], batch size: 57, lr: 5.42e-03, grad_scale: 32.0 2023-11-20 05:50:03,042 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=964953.3333333334, ans=0.125 2023-11-20 05:50:06,542 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=964953.3333333334, ans=0.1 2023-11-20 05:50:08,937 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 144750 2023-11-20 05:50:16,111 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=965020.0, ans=0.0 2023-11-20 05:50:22,160 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 05:50:35,532 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=965086.6666666666, ans=0.125 2023-11-20 05:50:51,388 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.53 vs. limit=15.0 2023-11-20 05:50:54,201 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 500, loss[loss=0.0816, simple_loss=0.1011, pruned_loss=0.02027, audio_tagging_loss=0.01076, over 15847.00 frames. ], tot_loss[loss=0.08098, simple_loss=0.1002, pruned_loss=0.02007, audio_tagging_loss=0.01083, over 2806713.87 frames. ], batch size: 59, lr: 5.42e-03, grad_scale: 32.0 2023-11-20 05:50:56,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=965220.0, ans=0.04949747468305833 2023-11-20 05:51:02,236 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.903e+01 8.170e+01 8.806e+01 9.563e+01 1.167e+02, threshold=1.761e+02, percent-clipped=0.0 2023-11-20 05:51:13,707 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 144800 2023-11-20 05:51:21,102 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=965353.3333333334, ans=0.2 2023-11-20 05:51:21,120 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=965353.3333333334, ans=0.125 2023-11-20 05:51:28,988 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=965353.3333333334, ans=0.125 2023-11-20 05:51:39,310 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=965420.0, ans=0.125 2023-11-20 05:51:59,464 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 550, loss[loss=0.07784, simple_loss=0.08964, pruned_loss=0.01981, audio_tagging_loss=0.01322, over 16754.00 frames. ], tot_loss[loss=0.08072, simple_loss=0.09997, pruned_loss=0.01999, audio_tagging_loss=0.01074, over 2861085.83 frames. ], batch size: 64, lr: 5.41e-03, grad_scale: 32.0 2023-11-20 05:51:59,675 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=965553.3333333334, ans=0.0 2023-11-20 05:52:19,374 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 144850 2023-11-20 05:52:20,634 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=965620.0, ans=0.2 2023-11-20 05:52:33,350 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=965686.6666666666, ans=0.035 2023-11-20 05:52:46,579 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.22 vs. limit=22.5 2023-11-20 05:53:01,466 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=965820.0, ans=0.025 2023-11-20 05:53:04,622 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 600, loss[loss=0.07109, simple_loss=0.08321, pruned_loss=0.01446, audio_tagging_loss=0.01503, over 16193.00 frames. ], tot_loss[loss=0.08043, simple_loss=0.09983, pruned_loss=0.01989, audio_tagging_loss=0.01062, over 2899910.31 frames. ], batch size: 59, lr: 5.41e-03, grad_scale: 32.0 2023-11-20 05:53:12,765 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.448e+01 8.106e+01 8.665e+01 9.302e+01 1.312e+02, threshold=1.733e+02, percent-clipped=0.0 2023-11-20 05:53:16,791 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=965953.3333333334, ans=0.0 2023-11-20 05:53:24,017 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 144900 2023-11-20 05:53:25,640 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.11 vs. limit=15.0 2023-11-20 05:53:30,277 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=966020.0, ans=0.0 2023-11-20 05:53:35,982 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=966020.0, ans=0.125 2023-11-20 05:53:36,091 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=966020.0, ans=0.09899494936611666 2023-11-20 05:53:41,028 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=966020.0, ans=0.0 2023-11-20 05:54:10,300 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 650, loss[loss=0.05893, simple_loss=0.07901, pruned_loss=0.01285, audio_tagging_loss=0.006579, over 14400.00 frames. ], tot_loss[loss=0.0799, simple_loss=0.09922, pruned_loss=0.01976, audio_tagging_loss=0.01052, over 2922399.15 frames. ], batch size: 55, lr: 5.41e-03, grad_scale: 32.0 2023-11-20 05:54:11,806 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=966220.0, ans=0.125 2023-11-20 05:54:29,377 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 144950 2023-11-20 05:54:40,008 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=966353.3333333334, ans=0.1 2023-11-20 05:54:43,302 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=966353.3333333334, ans=0.2 2023-11-20 05:54:56,569 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.71 vs. limit=15.0 2023-11-20 05:55:08,862 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.74 vs. limit=12.0 2023-11-20 05:55:13,546 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=966553.3333333334, ans=0.125 2023-11-20 05:55:14,364 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 700, loss[loss=0.0871, simple_loss=0.1173, pruned_loss=0.02089, audio_tagging_loss=0.007575, over 14513.00 frames. ], tot_loss[loss=0.0802, simple_loss=0.09989, pruned_loss=0.01979, audio_tagging_loss=0.01047, over 2952340.04 frames. ], batch size: 54, lr: 5.41e-03, grad_scale: 32.0 2023-11-20 05:55:22,272 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.647e+01 8.027e+01 8.645e+01 9.342e+01 1.133e+02, threshold=1.729e+02, percent-clipped=0.0 2023-11-20 05:55:34,889 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 145000 2023-11-20 05:55:42,137 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=966686.6666666666, ans=0.0 2023-11-20 05:55:43,197 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=966686.6666666666, ans=0.125 2023-11-20 05:55:44,602 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=966686.6666666666, ans=10.0 2023-11-20 05:56:02,123 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=966753.3333333334, ans=0.125 2023-11-20 05:56:03,244 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=966753.3333333334, ans=0.035 2023-11-20 05:56:10,539 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.44 vs. limit=15.0 2023-11-20 05:56:18,473 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=966820.0, ans=0.0 2023-11-20 05:56:21,180 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 750, loss[loss=0.09182, simple_loss=0.1115, pruned_loss=0.02663, audio_tagging_loss=0.009445, over 15321.00 frames. ], tot_loss[loss=0.0799, simple_loss=0.09926, pruned_loss=0.01979, audio_tagging_loss=0.01048, over 2975102.45 frames. ], batch size: 55, lr: 5.41e-03, grad_scale: 32.0 2023-11-20 05:56:26,335 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=966886.6666666666, ans=0.0 2023-11-20 05:56:37,007 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=966953.3333333334, ans=0.125 2023-11-20 05:56:40,476 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 145050 2023-11-20 05:56:40,953 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.13 vs. limit=15.0 2023-11-20 05:57:08,817 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=967086.6666666666, ans=0.125 2023-11-20 05:57:14,571 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=967153.3333333334, ans=0.1 2023-11-20 05:57:16,031 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.30 vs. limit=12.0 2023-11-20 05:57:21,616 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.13 vs. limit=15.0 2023-11-20 05:57:25,817 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 800, loss[loss=0.0887, simple_loss=0.118, pruned_loss=0.02285, audio_tagging_loss=0.006862, over 15191.00 frames. ], tot_loss[loss=0.0811, simple_loss=0.1009, pruned_loss=0.0203, audio_tagging_loss=0.01036, over 2995033.77 frames. ], batch size: 56, lr: 5.41e-03, grad_scale: 32.0 2023-11-20 05:57:27,612 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.23 vs. limit=15.0 2023-11-20 05:57:33,183 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.560e+01 8.396e+01 9.088e+01 9.762e+01 1.189e+02, threshold=1.818e+02, percent-clipped=0.0 2023-11-20 05:57:33,621 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=967220.0, ans=0.1 2023-11-20 05:57:38,854 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.21 vs. limit=15.0 2023-11-20 05:57:41,298 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.13 vs. limit=15.0 2023-11-20 05:57:44,468 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 145100 2023-11-20 05:57:55,767 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=967353.3333333334, ans=0.1 2023-11-20 05:58:02,299 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=967353.3333333334, ans=0.1 2023-11-20 05:58:07,658 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.88 vs. limit=10.0 2023-11-20 05:58:11,897 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.32 vs. limit=15.0 2023-11-20 05:58:18,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=967486.6666666666, ans=0.125 2023-11-20 05:58:29,589 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 850, loss[loss=0.07019, simple_loss=0.088, pruned_loss=0.01314, audio_tagging_loss=0.01306, over 15227.00 frames. ], tot_loss[loss=0.08124, simple_loss=0.1009, pruned_loss=0.02034, audio_tagging_loss=0.01046, over 3008683.53 frames. ], batch size: 56, lr: 5.41e-03, grad_scale: 32.0 2023-11-20 05:58:35,065 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.18 vs. limit=15.0 2023-11-20 05:58:36,165 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=967553.3333333334, ans=0.0 2023-11-20 05:58:42,973 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=967620.0, ans=0.0 2023-11-20 05:58:49,653 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 145150 2023-11-20 05:58:52,113 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=967620.0, ans=0.035 2023-11-20 05:59:12,756 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=967753.3333333334, ans=0.035 2023-11-20 05:59:34,783 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 900, loss[loss=0.1059, simple_loss=0.1375, pruned_loss=0.02954, audio_tagging_loss=0.007683, over 15477.00 frames. ], tot_loss[loss=0.08184, simple_loss=0.1017, pruned_loss=0.02045, audio_tagging_loss=0.01056, over 3017005.45 frames. ], batch size: 57, lr: 5.41e-03, grad_scale: 32.0 2023-11-20 05:59:42,662 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 5.961e+01 8.143e+01 8.772e+01 9.521e+01 1.429e+02, threshold=1.754e+02, percent-clipped=0.0 2023-11-20 05:59:54,345 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 145200 2023-11-20 06:00:09,732 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=968020.0, ans=0.0 2023-11-20 06:00:20,116 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=968086.6666666666, ans=0.2 2023-11-20 06:00:20,132 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=968086.6666666666, ans=0.2 2023-11-20 06:00:29,914 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=968153.3333333334, ans=0.125 2023-11-20 06:00:40,226 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 950, loss[loss=0.07089, simple_loss=0.08989, pruned_loss=0.01621, audio_tagging_loss=0.009731, over 15331.00 frames. ], tot_loss[loss=0.08196, simple_loss=0.1021, pruned_loss=0.02059, audio_tagging_loss=0.01033, over 3018666.51 frames. ], batch size: 57, lr: 5.41e-03, grad_scale: 32.0 2023-11-20 06:00:46,533 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=968220.0, ans=0.0 2023-11-20 06:00:47,858 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=968220.0, ans=0.125 2023-11-20 06:00:58,552 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 145250 2023-11-20 06:00:58,818 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 06:01:11,817 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=968353.3333333334, ans=0.2 2023-11-20 06:01:14,273 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=968353.3333333334, ans=0.125 2023-11-20 06:01:29,404 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.51 vs. limit=22.5 2023-11-20 06:01:31,767 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.17 vs. limit=15.0 2023-11-20 06:01:33,776 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=968486.6666666666, ans=0.0 2023-11-20 06:01:37,504 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=968486.6666666666, ans=0.125 2023-11-20 06:01:43,363 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 1000, loss[loss=0.08896, simple_loss=0.1138, pruned_loss=0.02373, audio_tagging_loss=0.008323, over 14663.00 frames. ], tot_loss[loss=0.08191, simple_loss=0.1021, pruned_loss=0.02067, audio_tagging_loss=0.0102, over 3021542.91 frames. ], batch size: 58, lr: 5.41e-03, grad_scale: 16.0 2023-11-20 06:01:43,618 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=968553.3333333334, ans=0.125 2023-11-20 06:01:51,988 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.711e+01 7.989e+01 8.669e+01 9.928e+01 1.196e+02, threshold=1.734e+02, percent-clipped=0.0 2023-11-20 06:01:54,953 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=968620.0, ans=0.0 2023-11-20 06:01:57,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=968620.0, ans=0.125 2023-11-20 06:02:02,721 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 145300 2023-11-20 06:02:10,092 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 06:02:18,246 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=968686.6666666666, ans=0.1 2023-11-20 06:02:33,352 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.21 vs. limit=10.0 2023-11-20 06:02:37,245 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.67 vs. limit=15.0 2023-11-20 06:02:48,471 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 1050, loss[loss=0.07796, simple_loss=0.09389, pruned_loss=0.02211, audio_tagging_loss=0.008905, over 14665.00 frames. ], tot_loss[loss=0.08118, simple_loss=0.1011, pruned_loss=0.02047, audio_tagging_loss=0.01015, over 3027706.05 frames. ], batch size: 57, lr: 5.41e-03, grad_scale: 16.0 2023-11-20 06:02:48,833 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=968886.6666666666, ans=0.0 2023-11-20 06:03:09,057 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 145350 2023-11-20 06:03:12,852 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=968953.3333333334, ans=0.125 2023-11-20 06:03:12,907 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=968953.3333333334, ans=0.07 2023-11-20 06:03:13,037 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=968953.3333333334, ans=0.0 2023-11-20 06:03:13,304 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.20 vs. limit=15.0 2023-11-20 06:03:28,050 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=969086.6666666666, ans=0.125 2023-11-20 06:03:49,483 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=969153.3333333334, ans=0.015 2023-11-20 06:03:54,967 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 1100, loss[loss=0.08843, simple_loss=0.1152, pruned_loss=0.02198, audio_tagging_loss=0.008837, over 15095.00 frames. ], tot_loss[loss=0.08098, simple_loss=0.1012, pruned_loss=0.0203, audio_tagging_loss=0.01006, over 3034365.92 frames. ], batch size: 57, lr: 5.40e-03, grad_scale: 16.0 2023-11-20 06:03:57,477 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 06:04:03,626 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.580e+01 8.161e+01 8.676e+01 9.552e+01 1.363e+02, threshold=1.735e+02, percent-clipped=0.0 2023-11-20 06:04:13,723 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 145400 2023-11-20 06:04:19,203 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=969353.3333333334, ans=0.125 2023-11-20 06:04:28,366 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=969353.3333333334, ans=0.95 2023-11-20 06:04:56,851 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.15 vs. limit=15.0 2023-11-20 06:04:58,714 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=969553.3333333334, ans=0.1 2023-11-20 06:04:59,686 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 1150, loss[loss=0.06794, simple_loss=0.08133, pruned_loss=0.01475, audio_tagging_loss=0.01252, over 14446.00 frames. ], tot_loss[loss=0.0811, simple_loss=0.1018, pruned_loss=0.02032, audio_tagging_loss=0.00989, over 3033240.87 frames. ], batch size: 54, lr: 5.40e-03, grad_scale: 16.0 2023-11-20 06:05:06,168 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 06:05:18,972 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 145450 2023-11-20 06:05:38,168 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 06:05:40,667 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=969753.3333333334, ans=0.2 2023-11-20 06:05:50,471 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=969820.0, ans=0.125 2023-11-20 06:05:51,923 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=969820.0, ans=0.1 2023-11-20 06:06:03,739 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 1200, loss[loss=0.07373, simple_loss=0.09781, pruned_loss=0.01498, audio_tagging_loss=0.00985, over 15599.00 frames. ], tot_loss[loss=0.081, simple_loss=0.1016, pruned_loss=0.02025, audio_tagging_loss=0.009948, over 3039520.48 frames. ], batch size: 59, lr: 5.40e-03, grad_scale: 32.0 2023-11-20 06:06:03,936 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=969886.6666666666, ans=0.125 2023-11-20 06:06:04,467 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.82 vs. limit=22.5 2023-11-20 06:06:09,595 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=969886.6666666666, ans=0.1 2023-11-20 06:06:12,761 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=969886.6666666666, ans=0.0 2023-11-20 06:06:13,707 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.636e+01 8.393e+01 8.955e+01 9.874e+01 3.263e+02, threshold=1.791e+02, percent-clipped=1.0 2023-11-20 06:06:24,501 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 145500 2023-11-20 06:06:31,302 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.50 vs. limit=15.0 2023-11-20 06:06:34,723 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=970020.0, ans=0.0 2023-11-20 06:06:43,707 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.73 vs. limit=15.0 2023-11-20 06:06:49,474 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=970086.6666666666, ans=0.0 2023-11-20 06:06:52,366 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.41 vs. limit=22.5 2023-11-20 06:07:01,748 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=970153.3333333334, ans=0.1 2023-11-20 06:07:07,814 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=970220.0, ans=0.125 2023-11-20 06:07:09,469 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 1250, loss[loss=0.07203, simple_loss=0.09789, pruned_loss=0.01398, audio_tagging_loss=0.009104, over 15519.00 frames. ], tot_loss[loss=0.08108, simple_loss=0.1016, pruned_loss=0.02026, audio_tagging_loss=0.01001, over 3040071.82 frames. ], batch size: 57, lr: 5.40e-03, grad_scale: 32.0 2023-11-20 06:07:09,856 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=970220.0, ans=0.2 2023-11-20 06:07:12,975 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=970220.0, ans=0.125 2023-11-20 06:07:14,455 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.79 vs. limit=6.0 2023-11-20 06:07:27,934 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.79 vs. limit=6.0 2023-11-20 06:07:28,469 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 145550 2023-11-20 06:07:32,141 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=970286.6666666666, ans=0.125 2023-11-20 06:07:45,034 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=970353.3333333334, ans=0.125 2023-11-20 06:07:47,976 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.47 vs. limit=15.0 2023-11-20 06:07:56,658 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=970420.0, ans=0.125 2023-11-20 06:07:56,716 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=970420.0, ans=0.95 2023-11-20 06:08:05,333 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=970486.6666666666, ans=0.0 2023-11-20 06:08:09,039 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=970486.6666666666, ans=0.2 2023-11-20 06:08:13,552 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 1300, loss[loss=0.08265, simple_loss=0.09897, pruned_loss=0.02277, audio_tagging_loss=0.0104, over 15110.00 frames. ], tot_loss[loss=0.08061, simple_loss=0.101, pruned_loss=0.02008, audio_tagging_loss=0.01004, over 3044224.03 frames. ], batch size: 56, lr: 5.40e-03, grad_scale: 32.0 2023-11-20 06:08:17,971 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.32 vs. limit=15.0 2023-11-20 06:08:22,207 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.703e+01 8.020e+01 8.850e+01 9.786e+01 1.232e+02, threshold=1.770e+02, percent-clipped=0.0 2023-11-20 06:08:32,751 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 145600 2023-11-20 06:08:45,209 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.34 vs. limit=10.0 2023-11-20 06:08:50,440 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=970686.6666666666, ans=0.125 2023-11-20 06:09:17,778 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 1350, loss[loss=0.0779, simple_loss=0.09008, pruned_loss=0.02104, audio_tagging_loss=0.01182, over 14548.00 frames. ], tot_loss[loss=0.08002, simple_loss=0.1003, pruned_loss=0.01985, audio_tagging_loss=0.01002, over 3042722.86 frames. ], batch size: 56, lr: 5.40e-03, grad_scale: 32.0 2023-11-20 06:09:37,902 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 145650 2023-11-20 06:09:38,260 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=970953.3333333334, ans=0.04949747468305833 2023-11-20 06:09:46,213 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=971020.0, ans=0.0 2023-11-20 06:10:03,337 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 06:10:12,108 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.24 vs. limit=10.0 2023-11-20 06:10:22,955 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 1400, loss[loss=0.06537, simple_loss=0.08674, pruned_loss=0.01227, audio_tagging_loss=0.009729, over 16588.00 frames. ], tot_loss[loss=0.08062, simple_loss=0.1009, pruned_loss=0.02013, audio_tagging_loss=0.01004, over 3043571.12 frames. ], batch size: 61, lr: 5.40e-03, grad_scale: 32.0 2023-11-20 06:10:32,222 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.083e+01 8.085e+01 8.864e+01 9.771e+01 1.469e+02, threshold=1.773e+02, percent-clipped=0.0 2023-11-20 06:10:42,860 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 145700 2023-11-20 06:10:51,611 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=971353.3333333334, ans=0.125 2023-11-20 06:11:11,446 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=971420.0, ans=0.5 2023-11-20 06:11:27,660 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff3.min_abs, batch_count=971553.3333333334, ans=0.2 2023-11-20 06:11:28,530 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 1450, loss[loss=0.0897, simple_loss=0.1221, pruned_loss=0.02211, audio_tagging_loss=0.006539, over 14865.00 frames. ], tot_loss[loss=0.08036, simple_loss=0.1008, pruned_loss=0.01987, audio_tagging_loss=0.01008, over 3041938.79 frames. ], batch size: 55, lr: 5.40e-03, grad_scale: 32.0 2023-11-20 06:11:45,056 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.24 vs. limit=15.0 2023-11-20 06:11:46,889 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 145750 2023-11-20 06:12:00,705 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=971686.6666666666, ans=0.125 2023-11-20 06:12:02,100 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.56 vs. limit=15.0 2023-11-20 06:12:14,753 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=971753.3333333334, ans=0.05 2023-11-20 06:12:22,666 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=971820.0, ans=0.1 2023-11-20 06:12:31,437 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=971886.6666666666, ans=0.1 2023-11-20 06:12:32,282 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 1500, loss[loss=0.09705, simple_loss=0.1172, pruned_loss=0.02738, audio_tagging_loss=0.01108, over 15151.00 frames. ], tot_loss[loss=0.08077, simple_loss=0.1011, pruned_loss=0.02009, audio_tagging_loss=0.01013, over 3034378.65 frames. ], batch size: 59, lr: 5.40e-03, grad_scale: 32.0 2023-11-20 06:12:32,937 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.60 vs. limit=15.0 2023-11-20 06:12:41,375 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.185e+01 8.333e+01 8.756e+01 9.459e+01 1.276e+02, threshold=1.751e+02, percent-clipped=0.0 2023-11-20 06:12:51,851 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 145800 2023-11-20 06:13:04,980 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.89 vs. limit=15.0 2023-11-20 06:13:22,045 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.22 vs. limit=15.0 2023-11-20 06:13:24,086 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=972153.3333333334, ans=0.0 2023-11-20 06:13:37,303 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 1550, loss[loss=0.08874, simple_loss=0.1011, pruned_loss=0.02523, audio_tagging_loss=0.01299, over 15804.00 frames. ], tot_loss[loss=0.08181, simple_loss=0.1021, pruned_loss=0.02049, audio_tagging_loss=0.01025, over 3044651.02 frames. ], batch size: 62, lr: 5.40e-03, grad_scale: 32.0 2023-11-20 06:13:39,176 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=972220.0, ans=15.0 2023-11-20 06:13:56,457 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 145850 2023-11-20 06:14:00,808 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=972286.6666666666, ans=0.125 2023-11-20 06:14:03,280 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=972353.3333333334, ans=0.125 2023-11-20 06:14:21,355 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=972420.0, ans=0.125 2023-11-20 06:14:42,102 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 1600, loss[loss=0.08123, simple_loss=0.1129, pruned_loss=0.01709, audio_tagging_loss=0.007709, over 15111.00 frames. ], tot_loss[loss=0.0822, simple_loss=0.1027, pruned_loss=0.02055, audio_tagging_loss=0.01033, over 3053590.40 frames. ], batch size: 56, lr: 5.40e-03, grad_scale: 32.0 2023-11-20 06:14:46,386 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.81 vs. limit=15.0 2023-11-20 06:14:46,715 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.71 vs. limit=15.0 2023-11-20 06:14:51,235 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.806e+01 8.279e+01 8.833e+01 9.772e+01 1.298e+02, threshold=1.767e+02, percent-clipped=0.0 2023-11-20 06:15:01,489 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 145900 2023-11-20 06:15:46,813 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 1650, loss[loss=0.07148, simple_loss=0.07692, pruned_loss=0.01794, audio_tagging_loss=0.01508, over 15238.00 frames. ], tot_loss[loss=0.08335, simple_loss=0.1039, pruned_loss=0.02098, audio_tagging_loss=0.0104, over 3053194.14 frames. ], batch size: 57, lr: 5.39e-03, grad_scale: 16.0 2023-11-20 06:15:49,601 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=972886.6666666666, ans=0.125 2023-11-20 06:16:06,547 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 145950 2023-11-20 06:16:06,717 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=972953.3333333334, ans=0.125 2023-11-20 06:16:20,220 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.54 vs. limit=22.5 2023-11-20 06:16:44,018 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=973153.3333333334, ans=0.125 2023-11-20 06:16:48,166 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=973153.3333333334, ans=0.0 2023-11-20 06:16:51,542 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 1700, loss[loss=0.08763, simple_loss=0.1093, pruned_loss=0.02575, audio_tagging_loss=0.007242, over 16323.00 frames. ], tot_loss[loss=0.08247, simple_loss=0.1027, pruned_loss=0.02064, audio_tagging_loss=0.01048, over 3046240.54 frames. ], batch size: 61, lr: 5.39e-03, grad_scale: 16.0 2023-11-20 06:17:02,512 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.898e+01 8.052e+01 8.811e+01 9.587e+01 1.278e+02, threshold=1.762e+02, percent-clipped=0.0 2023-11-20 06:17:07,041 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.40 vs. limit=22.5 2023-11-20 06:17:11,385 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 146000 2023-11-20 06:17:26,258 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.18 vs. limit=15.0 2023-11-20 06:17:38,731 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=973420.0, ans=0.1 2023-11-20 06:17:40,372 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.82 vs. limit=15.0 2023-11-20 06:17:45,253 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=973486.6666666666, ans=0.125 2023-11-20 06:17:47,628 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=973486.6666666666, ans=0.2 2023-11-20 06:17:47,853 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=973486.6666666666, ans=0.125 2023-11-20 06:17:56,485 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 1750, loss[loss=0.0641, simple_loss=0.07737, pruned_loss=0.01755, audio_tagging_loss=0.007871, over 14696.00 frames. ], tot_loss[loss=0.08179, simple_loss=0.1019, pruned_loss=0.02045, audio_tagging_loss=0.01041, over 3045834.64 frames. ], batch size: 57, lr: 5.39e-03, grad_scale: 16.0 2023-11-20 06:18:15,685 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 146050 2023-11-20 06:18:19,922 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.03 vs. limit=15.0 2023-11-20 06:18:41,256 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=973753.3333333334, ans=0.0 2023-11-20 06:18:42,322 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=973753.3333333334, ans=0.07 2023-11-20 06:19:00,622 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 1800, loss[loss=0.05708, simple_loss=0.06479, pruned_loss=0.01207, audio_tagging_loss=0.01261, over 14893.00 frames. ], tot_loss[loss=0.0813, simple_loss=0.1012, pruned_loss=0.02034, audio_tagging_loss=0.01037, over 3047086.33 frames. ], batch size: 57, lr: 5.39e-03, grad_scale: 16.0 2023-11-20 06:19:03,997 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=973886.6666666666, ans=0.125 2023-11-20 06:19:11,424 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.956e+01 8.340e+01 8.928e+01 9.770e+01 2.074e+02, threshold=1.786e+02, percent-clipped=1.0 2023-11-20 06:19:19,680 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=973953.3333333334, ans=0.0 2023-11-20 06:19:21,345 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 146100 2023-11-20 06:19:59,518 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=974153.3333333334, ans=0.05 2023-11-20 06:20:06,114 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 1850, loss[loss=0.08682, simple_loss=0.115, pruned_loss=0.02159, audio_tagging_loss=0.00774, over 15238.00 frames. ], tot_loss[loss=0.081, simple_loss=0.1008, pruned_loss=0.02021, audio_tagging_loss=0.01042, over 3040987.29 frames. ], batch size: 58, lr: 5.39e-03, grad_scale: 16.0 2023-11-20 06:20:19,092 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=974286.6666666666, ans=6.0 2023-11-20 06:20:22,270 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=974286.6666666666, ans=0.1 2023-11-20 06:20:25,772 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 146150 2023-11-20 06:20:25,943 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=974286.6666666666, ans=0.0 2023-11-20 06:20:32,381 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=974353.3333333334, ans=0.125 2023-11-20 06:20:44,612 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.40 vs. limit=15.0 2023-11-20 06:20:52,699 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=974420.0, ans=0.0 2023-11-20 06:21:10,642 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=974553.3333333334, ans=0.125 2023-11-20 06:21:11,585 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 1900, loss[loss=0.08741, simple_loss=0.1097, pruned_loss=0.02297, audio_tagging_loss=0.009573, over 16327.00 frames. ], tot_loss[loss=0.08134, simple_loss=0.1016, pruned_loss=0.02035, audio_tagging_loss=0.01021, over 3040967.40 frames. ], batch size: 61, lr: 5.39e-03, grad_scale: 16.0 2023-11-20 06:21:21,308 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.109e+01 8.191e+01 8.714e+01 9.670e+01 1.123e+02, threshold=1.743e+02, percent-clipped=0.0 2023-11-20 06:21:30,142 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 146200 2023-11-20 06:21:36,296 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=974686.6666666666, ans=0.07 2023-11-20 06:21:51,474 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.23 vs. limit=22.5 2023-11-20 06:21:58,072 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=974753.3333333334, ans=0.125 2023-11-20 06:22:06,284 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=974820.0, ans=0.125 2023-11-20 06:22:06,817 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.01 vs. limit=15.0 2023-11-20 06:22:08,196 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.96 vs. limit=15.0 2023-11-20 06:22:16,090 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 1950, loss[loss=0.08767, simple_loss=0.1175, pruned_loss=0.02036, audio_tagging_loss=0.008563, over 15860.00 frames. ], tot_loss[loss=0.0808, simple_loss=0.1011, pruned_loss=0.02005, audio_tagging_loss=0.0102, over 3044117.49 frames. ], batch size: 56, lr: 5.39e-03, grad_scale: 16.0 2023-11-20 06:22:21,226 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff2.min_abs, batch_count=974886.6666666666, ans=0.1 2023-11-20 06:22:22,626 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=974886.6666666666, ans=0.0 2023-11-20 06:22:23,713 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 06:22:34,790 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=974953.3333333334, ans=0.125 2023-11-20 06:22:35,952 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 146250 2023-11-20 06:22:41,783 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=975020.0, ans=0.0 2023-11-20 06:22:51,010 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=975020.0, ans=0.125 2023-11-20 06:23:08,072 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=975153.3333333334, ans=0.1 2023-11-20 06:23:13,761 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.66 vs. limit=6.0 2023-11-20 06:23:21,425 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 2000, loss[loss=0.07548, simple_loss=0.09295, pruned_loss=0.02065, audio_tagging_loss=0.008349, over 14884.00 frames. ], tot_loss[loss=0.081, simple_loss=0.1012, pruned_loss=0.02025, audio_tagging_loss=0.01016, over 3042222.07 frames. ], batch size: 58, lr: 5.39e-03, grad_scale: 32.0 2023-11-20 06:23:28,409 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=975220.0, ans=0.125 2023-11-20 06:23:29,769 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=975220.0, ans=0.0 2023-11-20 06:23:31,799 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.029e+01 7.759e+01 8.483e+01 9.476e+01 1.626e+02, threshold=1.697e+02, percent-clipped=0.0 2023-11-20 06:23:41,225 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 146300 2023-11-20 06:23:42,655 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=975286.6666666666, ans=0.2 2023-11-20 06:23:53,705 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=975353.3333333334, ans=0.2 2023-11-20 06:23:56,628 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.50 vs. limit=15.0 2023-11-20 06:23:57,423 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=975353.3333333334, ans=0.0 2023-11-20 06:24:26,436 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 2050, loss[loss=0.07559, simple_loss=0.09169, pruned_loss=0.01769, audio_tagging_loss=0.01205, over 14111.00 frames. ], tot_loss[loss=0.08076, simple_loss=0.1008, pruned_loss=0.02018, audio_tagging_loss=0.01018, over 3049634.84 frames. ], batch size: 52, lr: 5.39e-03, grad_scale: 32.0 2023-11-20 06:24:35,360 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=975553.3333333334, ans=0.125 2023-11-20 06:24:45,180 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 146350 2023-11-20 06:24:56,427 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=975686.6666666666, ans=0.125 2023-11-20 06:25:18,193 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.00 vs. limit=6.0 2023-11-20 06:25:25,362 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=975820.0, ans=0.0 2023-11-20 06:25:30,032 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 2100, loss[loss=0.06629, simple_loss=0.07791, pruned_loss=0.01528, audio_tagging_loss=0.01205, over 14815.00 frames. ], tot_loss[loss=0.08103, simple_loss=0.1012, pruned_loss=0.02033, audio_tagging_loss=0.01011, over 3045990.47 frames. ], batch size: 55, lr: 5.39e-03, grad_scale: 32.0 2023-11-20 06:25:39,689 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.798e+01 8.216e+01 8.927e+01 9.679e+01 1.244e+02, threshold=1.785e+02, percent-clipped=0.0 2023-11-20 06:25:48,945 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 146400 2023-11-20 06:26:02,924 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=976020.0, ans=0.0 2023-11-20 06:26:24,380 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=976153.3333333334, ans=0.125 2023-11-20 06:26:26,713 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=976153.3333333334, ans=0.125 2023-11-20 06:26:33,948 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 2150, loss[loss=0.1013, simple_loss=0.1296, pruned_loss=0.02655, audio_tagging_loss=0.009992, over 14198.00 frames. ], tot_loss[loss=0.08127, simple_loss=0.1017, pruned_loss=0.02038, audio_tagging_loss=0.01004, over 3049511.13 frames. ], batch size: 53, lr: 5.39e-03, grad_scale: 32.0 2023-11-20 06:26:42,864 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=976220.0, ans=0.125 2023-11-20 06:26:43,277 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.59 vs. limit=22.5 2023-11-20 06:26:55,012 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 146450 2023-11-20 06:27:12,185 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 06:27:12,383 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=976420.0, ans=0.125 2023-11-20 06:27:17,282 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=976420.0, ans=0.0 2023-11-20 06:27:20,492 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.68 vs. limit=22.5 2023-11-20 06:27:25,377 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.33 vs. limit=22.5 2023-11-20 06:27:37,797 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=976486.6666666666, ans=0.0 2023-11-20 06:27:37,824 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=976486.6666666666, ans=0.0 2023-11-20 06:27:39,838 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 2200, loss[loss=0.08528, simple_loss=0.1123, pruned_loss=0.01977, audio_tagging_loss=0.009388, over 16496.00 frames. ], tot_loss[loss=0.08203, simple_loss=0.1027, pruned_loss=0.02056, audio_tagging_loss=0.01011, over 3052083.39 frames. ], batch size: 61, lr: 5.38e-03, grad_scale: 32.0 2023-11-20 06:27:43,273 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=976553.3333333334, ans=0.125 2023-11-20 06:27:44,941 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.61 vs. limit=22.5 2023-11-20 06:27:50,511 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.751e+01 8.222e+01 8.987e+01 9.495e+01 1.215e+02, threshold=1.797e+02, percent-clipped=0.0 2023-11-20 06:27:59,350 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 146500 2023-11-20 06:28:08,104 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=976686.6666666666, ans=0.05 2023-11-20 06:28:44,324 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 2250, loss[loss=0.1006, simple_loss=0.1286, pruned_loss=0.02651, audio_tagging_loss=0.009764, over 15118.00 frames. ], tot_loss[loss=0.08227, simple_loss=0.103, pruned_loss=0.02069, audio_tagging_loss=0.0101, over 3042613.84 frames. ], batch size: 57, lr: 5.38e-03, grad_scale: 32.0 2023-11-20 06:28:50,978 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=976886.6666666666, ans=0.1 2023-11-20 06:28:59,206 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=976953.3333333334, ans=0.5 2023-11-20 06:29:03,330 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 146550 2023-11-20 06:29:28,920 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=977086.6666666666, ans=0.125 2023-11-20 06:29:39,005 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.31 vs. limit=22.5 2023-11-20 06:29:48,048 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 2300, loss[loss=0.09215, simple_loss=0.126, pruned_loss=0.02034, audio_tagging_loss=0.008817, over 16062.00 frames. ], tot_loss[loss=0.08165, simple_loss=0.1023, pruned_loss=0.02042, audio_tagging_loss=0.01008, over 3044963.09 frames. ], batch size: 59, lr: 5.38e-03, grad_scale: 32.0 2023-11-20 06:29:58,630 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.545e+01 8.135e+01 8.852e+01 9.695e+01 1.259e+02, threshold=1.770e+02, percent-clipped=0.0 2023-11-20 06:30:07,412 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=977286.6666666666, ans=0.0 2023-11-20 06:30:08,677 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 146600 2023-11-20 06:30:17,370 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=977353.3333333334, ans=0.125 2023-11-20 06:30:31,030 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.85 vs. limit=15.0 2023-11-20 06:30:42,163 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=977486.6666666666, ans=0.125 2023-11-20 06:30:43,792 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.28 vs. limit=6.0 2023-11-20 06:30:44,349 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 06:30:53,529 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 2350, loss[loss=0.1068, simple_loss=0.1291, pruned_loss=0.03108, audio_tagging_loss=0.01121, over 15457.00 frames. ], tot_loss[loss=0.08165, simple_loss=0.1023, pruned_loss=0.02032, audio_tagging_loss=0.01016, over 3051151.18 frames. ], batch size: 57, lr: 5.38e-03, grad_scale: 32.0 2023-11-20 06:31:12,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=977620.0, ans=0.1 2023-11-20 06:31:13,197 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 146650 2023-11-20 06:31:27,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=977686.6666666666, ans=0.0 2023-11-20 06:31:42,687 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=977753.3333333334, ans=0.125 2023-11-20 06:31:49,931 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=977820.0, ans=0.125 2023-11-20 06:31:58,165 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 2400, loss[loss=0.07883, simple_loss=0.09903, pruned_loss=0.01923, audio_tagging_loss=0.01009, over 15146.00 frames. ], tot_loss[loss=0.08165, simple_loss=0.1022, pruned_loss=0.02024, audio_tagging_loss=0.01033, over 3046721.45 frames. ], batch size: 56, lr: 5.38e-03, grad_scale: 32.0 2023-11-20 06:32:09,025 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.339e+01 8.246e+01 8.786e+01 9.716e+01 1.266e+02, threshold=1.757e+02, percent-clipped=0.0 2023-11-20 06:32:09,695 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.36 vs. limit=12.0 2023-11-20 06:32:14,399 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.85 vs. limit=15.0 2023-11-20 06:32:16,321 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 146700 2023-11-20 06:32:17,742 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=977953.3333333334, ans=0.05 2023-11-20 06:32:20,861 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=977953.3333333334, ans=0.125 2023-11-20 06:32:31,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=978020.0, ans=0.125 2023-11-20 06:32:35,534 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=978086.6666666666, ans=0.125 2023-11-20 06:32:44,546 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=978086.6666666666, ans=0.125 2023-11-20 06:32:50,805 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=978153.3333333334, ans=0.0 2023-11-20 06:32:54,209 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=978153.3333333334, ans=0.125 2023-11-20 06:33:01,533 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 2450, loss[loss=0.08796, simple_loss=0.1031, pruned_loss=0.02809, audio_tagging_loss=0.008313, over 15653.00 frames. ], tot_loss[loss=0.08139, simple_loss=0.1015, pruned_loss=0.02022, audio_tagging_loss=0.01043, over 3041254.24 frames. ], batch size: 60, lr: 5.38e-03, grad_scale: 32.0 2023-11-20 06:33:10,447 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.12 vs. limit=15.0 2023-11-20 06:33:21,612 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 146750 2023-11-20 06:33:35,943 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=978353.3333333334, ans=0.1 2023-11-20 06:33:48,432 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=978420.0, ans=0.1 2023-11-20 06:33:48,494 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=978420.0, ans=0.125 2023-11-20 06:33:52,010 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=978486.6666666666, ans=0.0 2023-11-20 06:33:58,862 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=978486.6666666666, ans=0.95 2023-11-20 06:34:06,459 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 2500, loss[loss=0.07895, simple_loss=0.1046, pruned_loss=0.0189, audio_tagging_loss=0.007748, over 15116.00 frames. ], tot_loss[loss=0.08141, simple_loss=0.1016, pruned_loss=0.02019, audio_tagging_loss=0.01043, over 3046075.40 frames. ], batch size: 55, lr: 5.38e-03, grad_scale: 32.0 2023-11-20 06:34:12,778 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 06:34:18,577 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.294e+01 8.313e+01 9.053e+01 9.691e+01 1.783e+02, threshold=1.811e+02, percent-clipped=1.0 2023-11-20 06:34:26,053 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 146800 2023-11-20 06:34:26,197 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=978620.0, ans=0.0 2023-11-20 06:34:32,876 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=978686.6666666666, ans=0.2 2023-11-20 06:34:47,276 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=978753.3333333334, ans=0.1 2023-11-20 06:34:49,685 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=978753.3333333334, ans=0.05 2023-11-20 06:35:00,065 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=978820.0, ans=0.125 2023-11-20 06:35:11,793 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 2550, loss[loss=0.06686, simple_loss=0.06944, pruned_loss=0.0149, audio_tagging_loss=0.01724, over 15454.00 frames. ], tot_loss[loss=0.08113, simple_loss=0.1012, pruned_loss=0.02022, audio_tagging_loss=0.0103, over 3049788.23 frames. ], batch size: 60, lr: 5.38e-03, grad_scale: 32.0 2023-11-20 06:35:14,507 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=978886.6666666666, ans=0.125 2023-11-20 06:35:23,511 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.43 vs. limit=12.0 2023-11-20 06:35:29,930 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 146850 2023-11-20 06:35:50,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=979086.6666666666, ans=0.0 2023-11-20 06:35:54,031 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=979086.6666666666, ans=0.125 2023-11-20 06:35:55,226 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=979086.6666666666, ans=0.2 2023-11-20 06:36:07,087 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.78 vs. limit=15.0 2023-11-20 06:36:13,429 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.64 vs. limit=22.5 2023-11-20 06:36:15,064 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 2600, loss[loss=0.104, simple_loss=0.1299, pruned_loss=0.03007, audio_tagging_loss=0.008989, over 15390.00 frames. ], tot_loss[loss=0.08059, simple_loss=0.1008, pruned_loss=0.02005, audio_tagging_loss=0.01016, over 3042518.79 frames. ], batch size: 58, lr: 5.38e-03, grad_scale: 32.0 2023-11-20 06:36:15,497 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=979220.0, ans=0.2 2023-11-20 06:36:19,178 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=979220.0, ans=0.2 2023-11-20 06:36:26,630 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.093e+01 8.215e+01 8.794e+01 9.573e+01 1.201e+02, threshold=1.759e+02, percent-clipped=0.0 2023-11-20 06:36:34,893 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 146900 2023-11-20 06:36:36,828 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.08 vs. limit=22.5 2023-11-20 06:36:49,935 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=979353.3333333334, ans=0.07 2023-11-20 06:36:58,293 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=979420.0, ans=0.1 2023-11-20 06:37:03,268 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=979420.0, ans=0.125 2023-11-20 06:37:14,200 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=979486.6666666666, ans=15.0 2023-11-20 06:37:20,160 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 2650, loss[loss=0.05838, simple_loss=0.06547, pruned_loss=0.01181, audio_tagging_loss=0.01383, over 14865.00 frames. ], tot_loss[loss=0.08111, simple_loss=0.1014, pruned_loss=0.02023, audio_tagging_loss=0.01019, over 3044028.34 frames. ], batch size: 59, lr: 5.38e-03, grad_scale: 16.0 2023-11-20 06:37:20,448 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=979553.3333333334, ans=0.0 2023-11-20 06:37:39,795 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 146950 2023-11-20 06:37:57,178 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=979753.3333333334, ans=0.2 2023-11-20 06:37:58,289 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=979753.3333333334, ans=0.0 2023-11-20 06:38:07,544 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=979753.3333333334, ans=0.0 2023-11-20 06:38:18,061 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=979820.0, ans=0.2 2023-11-20 06:38:24,554 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 2700, loss[loss=0.1048, simple_loss=0.1325, pruned_loss=0.02939, audio_tagging_loss=0.009171, over 16705.00 frames. ], tot_loss[loss=0.08074, simple_loss=0.1009, pruned_loss=0.02018, audio_tagging_loss=0.01012, over 3046585.68 frames. ], batch size: 60, lr: 5.38e-03, grad_scale: 16.0 2023-11-20 06:38:37,517 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.654e+01 8.126e+01 8.706e+01 9.526e+01 1.459e+02, threshold=1.741e+02, percent-clipped=0.0 2023-11-20 06:38:43,841 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 147000 2023-11-20 06:38:53,694 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=980020.0, ans=0.125 2023-11-20 06:39:00,325 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=980020.0, ans=0.125 2023-11-20 06:39:14,460 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=980086.6666666666, ans=0.2 2023-11-20 06:39:29,320 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 2750, loss[loss=0.0797, simple_loss=0.09487, pruned_loss=0.02079, audio_tagging_loss=0.01148, over 16130.00 frames. ], tot_loss[loss=0.08037, simple_loss=0.1003, pruned_loss=0.02007, audio_tagging_loss=0.01016, over 3037981.30 frames. ], batch size: 62, lr: 5.37e-03, grad_scale: 16.0 2023-11-20 06:39:39,466 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=980220.0, ans=0.1 2023-11-20 06:39:49,318 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 147050 2023-11-20 06:39:49,616 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=980286.6666666666, ans=0.0 2023-11-20 06:40:01,371 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.30 vs. limit=10.0 2023-11-20 06:40:11,856 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.31 vs. limit=22.5 2023-11-20 06:40:23,478 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 06:40:27,256 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=980486.6666666666, ans=0.1 2023-11-20 06:40:30,558 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=980486.6666666666, ans=0.125 2023-11-20 06:40:31,855 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=980486.6666666666, ans=10.0 2023-11-20 06:40:33,978 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 2800, loss[loss=0.05604, simple_loss=0.05936, pruned_loss=0.01691, audio_tagging_loss=0.009452, over 13655.00 frames. ], tot_loss[loss=0.07996, simple_loss=0.09967, pruned_loss=0.01997, audio_tagging_loss=0.01015, over 3037366.63 frames. ], batch size: 54, lr: 5.37e-03, grad_scale: 16.0 2023-11-20 06:40:35,001 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.45 vs. limit=5.0 2023-11-20 06:40:45,053 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.34 vs. limit=15.0 2023-11-20 06:40:48,688 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.373e+01 8.295e+01 8.926e+01 1.019e+02 1.823e+02, threshold=1.785e+02, percent-clipped=1.0 2023-11-20 06:40:52,803 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=980620.0, ans=0.1 2023-11-20 06:40:52,885 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=980620.0, ans=0.125 2023-11-20 06:40:53,885 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 147100 2023-11-20 06:40:55,408 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=980620.0, ans=0.025 2023-11-20 06:41:14,509 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=980753.3333333334, ans=0.05 2023-11-20 06:41:30,345 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=980820.0, ans=0.125 2023-11-20 06:41:39,271 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 2850, loss[loss=0.086, simple_loss=0.1046, pruned_loss=0.02448, audio_tagging_loss=0.009249, over 14789.00 frames. ], tot_loss[loss=0.08001, simple_loss=0.1001, pruned_loss=0.01991, audio_tagging_loss=0.01007, over 3033177.66 frames. ], batch size: 56, lr: 5.37e-03, grad_scale: 16.0 2023-11-20 06:41:50,645 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=980953.3333333334, ans=0.0 2023-11-20 06:41:58,394 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 147150 2023-11-20 06:42:11,780 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=981020.0, ans=0.125 2023-11-20 06:42:17,330 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=981086.6666666666, ans=0.0 2023-11-20 06:42:19,760 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=981086.6666666666, ans=0.125 2023-11-20 06:42:28,684 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=981086.6666666666, ans=0.0 2023-11-20 06:42:33,601 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=981153.3333333334, ans=0.125 2023-11-20 06:42:37,437 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=981153.3333333334, ans=0.0 2023-11-20 06:42:43,707 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 2900, loss[loss=0.07872, simple_loss=0.09889, pruned_loss=0.01895, audio_tagging_loss=0.01032, over 14877.00 frames. ], tot_loss[loss=0.08006, simple_loss=0.1003, pruned_loss=0.01985, audio_tagging_loss=0.01005, over 3039052.74 frames. ], batch size: 55, lr: 5.37e-03, grad_scale: 16.0 2023-11-20 06:42:43,963 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=981220.0, ans=0.125 2023-11-20 06:42:57,846 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.790e+01 8.308e+01 8.941e+01 9.842e+01 1.369e+02, threshold=1.788e+02, percent-clipped=0.0 2023-11-20 06:43:02,753 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 147200 2023-11-20 06:43:29,369 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.65 vs. limit=15.0 2023-11-20 06:43:40,872 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.34 vs. limit=6.0 2023-11-20 06:43:41,856 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=981486.6666666666, ans=0.1 2023-11-20 06:43:48,113 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 2950, loss[loss=0.08541, simple_loss=0.1066, pruned_loss=0.02473, audio_tagging_loss=0.007382, over 15047.00 frames. ], tot_loss[loss=0.0809, simple_loss=0.1014, pruned_loss=0.02029, audio_tagging_loss=0.009929, over 3047923.01 frames. ], batch size: 55, lr: 5.37e-03, grad_scale: 16.0 2023-11-20 06:43:53,922 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=981553.3333333334, ans=0.0 2023-11-20 06:44:05,467 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=981620.0, ans=0.1 2023-11-20 06:44:07,691 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 147250 2023-11-20 06:44:34,203 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=981753.3333333334, ans=0.0 2023-11-20 06:44:50,248 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=981820.0, ans=0.125 2023-11-20 06:44:53,068 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 3000, loss[loss=0.1027, simple_loss=0.1326, pruned_loss=0.02687, audio_tagging_loss=0.009465, over 14828.00 frames. ], tot_loss[loss=0.08104, simple_loss=0.1012, pruned_loss=0.02038, audio_tagging_loss=0.01004, over 3047940.21 frames. ], batch size: 57, lr: 5.37e-03, grad_scale: 16.0 2023-11-20 06:44:53,071 INFO [train_asr.py:1285] (0/4) Computing validation loss 2023-11-20 06:45:11,371 INFO [zipformer.py:1873] (0/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.1465, 3.9682, 3.6835, 3.6570], device='cuda:0') 2023-11-20 06:45:25,287 INFO [zipformer.py:1873] (0/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.8739, 3.3025, 4.8036, 4.3180], device='cuda:0') 2023-11-20 06:45:31,945 INFO [train_asr.py:1294] (0/4) Epoch 13, validation: loss=0.06242, simple_loss=0.05394, pruned_loss=0.005804, audio_tagging_loss=0.02964, over 4681554.00 frames. 2023-11-20 06:45:31,946 INFO [train_asr.py:1295] (0/4) Maximum memory allocated so far is 25925MB 2023-11-20 06:45:33,314 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=981886.6666666666, ans=0.125 2023-11-20 06:45:44,706 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=981953.3333333334, ans=0.0 2023-11-20 06:45:46,808 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.395e+01 8.201e+01 8.897e+01 9.903e+01 1.229e+02, threshold=1.779e+02, percent-clipped=0.0 2023-11-20 06:45:48,317 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=981953.3333333334, ans=0.0 2023-11-20 06:45:52,539 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 147300 2023-11-20 06:46:36,682 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=982220.0, ans=0.05 2023-11-20 06:46:37,627 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 3050, loss[loss=0.09479, simple_loss=0.1307, pruned_loss=0.02171, audio_tagging_loss=0.007716, over 15626.00 frames. ], tot_loss[loss=0.08096, simple_loss=0.1013, pruned_loss=0.02018, audio_tagging_loss=0.01014, over 3052682.22 frames. ], batch size: 55, lr: 5.37e-03, grad_scale: 16.0 2023-11-20 06:46:40,790 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=982220.0, ans=0.125 2023-11-20 06:46:42,084 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=982220.0, ans=0.125 2023-11-20 06:46:51,214 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=982286.6666666666, ans=0.0 2023-11-20 06:46:57,050 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 147350 2023-11-20 06:46:57,322 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=982286.6666666666, ans=0.125 2023-11-20 06:46:57,324 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=982286.6666666666, ans=0.125 2023-11-20 06:46:59,758 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=982286.6666666666, ans=0.2 2023-11-20 06:47:07,018 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=982353.3333333334, ans=0.5 2023-11-20 06:47:13,010 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 06:47:23,215 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=982420.0, ans=0.0 2023-11-20 06:47:42,632 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 3100, loss[loss=0.07461, simple_loss=0.09139, pruned_loss=0.01747, audio_tagging_loss=0.01145, over 14162.00 frames. ], tot_loss[loss=0.0808, simple_loss=0.1007, pruned_loss=0.02016, audio_tagging_loss=0.01028, over 3053324.25 frames. ], batch size: 56, lr: 5.37e-03, grad_scale: 16.0 2023-11-20 06:47:53,001 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.37 vs. limit=15.0 2023-11-20 06:47:53,683 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=982620.0, ans=0.2 2023-11-20 06:47:55,952 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.221e+01 8.032e+01 8.672e+01 9.635e+01 1.172e+02, threshold=1.734e+02, percent-clipped=0.0 2023-11-20 06:48:00,929 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 147400 2023-11-20 06:48:01,195 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=982620.0, ans=0.09899494936611666 2023-11-20 06:48:19,433 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 06:48:22,135 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=982753.3333333334, ans=0.125 2023-11-20 06:48:23,982 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.49 vs. limit=6.0 2023-11-20 06:48:36,325 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=982820.0, ans=0.1 2023-11-20 06:48:46,329 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=8.177e-03 2023-11-20 06:48:47,365 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 3150, loss[loss=0.087, simple_loss=0.1088, pruned_loss=0.0245, audio_tagging_loss=0.00811, over 14446.00 frames. ], tot_loss[loss=0.08072, simple_loss=0.1005, pruned_loss=0.02012, audio_tagging_loss=0.01037, over 3050896.70 frames. ], batch size: 54, lr: 5.37e-03, grad_scale: 16.0 2023-11-20 06:49:06,548 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 147450 2023-11-20 06:49:14,884 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=983020.0, ans=0.2 2023-11-20 06:49:26,659 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=983086.6666666666, ans=0.0 2023-11-20 06:49:33,814 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=983086.6666666666, ans=0.2 2023-11-20 06:49:38,743 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=983153.3333333334, ans=0.0 2023-11-20 06:49:51,436 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.38 vs. limit=12.0 2023-11-20 06:49:52,025 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 3200, loss[loss=0.06833, simple_loss=0.08315, pruned_loss=0.01746, audio_tagging_loss=0.009296, over 14272.00 frames. ], tot_loss[loss=0.0804, simple_loss=0.1002, pruned_loss=0.01992, audio_tagging_loss=0.01039, over 3051020.02 frames. ], batch size: 57, lr: 5.37e-03, grad_scale: 32.0 2023-11-20 06:50:05,111 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=983286.6666666666, ans=0.0 2023-11-20 06:50:06,131 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.614e+01 8.160e+01 8.999e+01 9.638e+01 2.555e+02, threshold=1.800e+02, percent-clipped=1.0 2023-11-20 06:50:06,960 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.65 vs. limit=10.0 2023-11-20 06:50:10,602 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=983286.6666666666, ans=0.1 2023-11-20 06:50:11,817 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 147500 2023-11-20 06:50:26,670 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 06:50:56,500 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 3250, loss[loss=0.1119, simple_loss=0.148, pruned_loss=0.0288, audio_tagging_loss=0.009081, over 15857.00 frames. ], tot_loss[loss=0.08114, simple_loss=0.1013, pruned_loss=0.02015, audio_tagging_loss=0.01035, over 3045027.32 frames. ], batch size: 56, lr: 5.37e-03, grad_scale: 32.0 2023-11-20 06:51:15,556 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 147550 2023-11-20 06:51:32,803 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=983686.6666666666, ans=0.09899494936611666 2023-11-20 06:51:40,032 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.92 vs. limit=10.0 2023-11-20 06:51:42,618 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.53 vs. limit=22.5 2023-11-20 06:51:55,247 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=983820.0, ans=0.125 2023-11-20 06:52:00,908 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 3300, loss[loss=0.09317, simple_loss=0.1198, pruned_loss=0.02621, audio_tagging_loss=0.007049, over 15110.00 frames. ], tot_loss[loss=0.08128, simple_loss=0.1014, pruned_loss=0.02016, audio_tagging_loss=0.0104, over 3051820.52 frames. ], batch size: 57, lr: 5.36e-03, grad_scale: 32.0 2023-11-20 06:52:02,521 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=983886.6666666666, ans=0.125 2023-11-20 06:52:12,499 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=983953.3333333334, ans=0.125 2023-11-20 06:52:14,569 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.228e+01 8.367e+01 9.193e+01 1.015e+02 1.401e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-20 06:52:20,256 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 147600 2023-11-20 06:52:24,342 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.42 vs. limit=15.0 2023-11-20 06:52:27,905 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.53 vs. limit=15.0 2023-11-20 06:52:39,334 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=984086.6666666666, ans=0.95 2023-11-20 06:53:04,356 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.60 vs. limit=10.0 2023-11-20 06:53:05,441 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 3350, loss[loss=0.07689, simple_loss=0.09561, pruned_loss=0.01804, audio_tagging_loss=0.01105, over 15060.00 frames. ], tot_loss[loss=0.08119, simple_loss=0.1015, pruned_loss=0.0201, audio_tagging_loss=0.01034, over 3052919.96 frames. ], batch size: 54, lr: 5.36e-03, grad_scale: 32.0 2023-11-20 06:53:26,294 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 147650 2023-11-20 06:54:08,119 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=984486.6666666666, ans=0.125 2023-11-20 06:54:08,242 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 06:54:11,600 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 3400, loss[loss=0.08171, simple_loss=0.1117, pruned_loss=0.01654, audio_tagging_loss=0.009314, over 15112.00 frames. ], tot_loss[loss=0.08123, simple_loss=0.102, pruned_loss=0.02009, audio_tagging_loss=0.01014, over 3051163.23 frames. ], batch size: 56, lr: 5.36e-03, grad_scale: 32.0 2023-11-20 06:54:25,755 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.861e+01 8.407e+01 9.126e+01 1.002e+02 1.896e+02, threshold=1.825e+02, percent-clipped=1.0 2023-11-20 06:54:31,029 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 147700 2023-11-20 06:54:33,638 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=984620.0, ans=0.05 2023-11-20 06:54:33,693 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=984620.0, ans=0.0 2023-11-20 06:55:07,173 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.56 vs. limit=15.0 2023-11-20 06:55:07,349 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.04 vs. limit=6.0 2023-11-20 06:55:16,222 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 3450, loss[loss=0.08235, simple_loss=0.09514, pruned_loss=0.02342, audio_tagging_loss=0.01136, over 15721.00 frames. ], tot_loss[loss=0.08104, simple_loss=0.1019, pruned_loss=0.02004, audio_tagging_loss=0.01003, over 3050545.43 frames. ], batch size: 61, lr: 5.36e-03, grad_scale: 32.0 2023-11-20 06:55:22,777 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=984886.6666666666, ans=0.2 2023-11-20 06:55:24,169 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.19 vs. limit=15.0 2023-11-20 06:55:35,453 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 147750 2023-11-20 06:55:43,502 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=985020.0, ans=0.125 2023-11-20 06:56:02,358 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.75 vs. limit=15.0 2023-11-20 06:56:07,110 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.08 vs. limit=15.0 2023-11-20 06:56:15,404 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=985153.3333333334, ans=0.0 2023-11-20 06:56:19,942 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 3500, loss[loss=0.09754, simple_loss=0.1089, pruned_loss=0.03379, audio_tagging_loss=0.009318, over 13442.00 frames. ], tot_loss[loss=0.08049, simple_loss=0.1009, pruned_loss=0.02001, audio_tagging_loss=0.01004, over 3051094.64 frames. ], batch size: 53, lr: 5.36e-03, grad_scale: 32.0 2023-11-20 06:56:23,251 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=985220.0, ans=0.0 2023-11-20 06:56:23,372 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=985220.0, ans=0.95 2023-11-20 06:56:25,801 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=985220.0, ans=0.05 2023-11-20 06:56:34,560 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.948e+01 8.197e+01 8.889e+01 1.015e+02 2.808e+02, threshold=1.778e+02, percent-clipped=1.0 2023-11-20 06:56:37,062 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.26 vs. limit=15.0 2023-11-20 06:56:40,133 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 147800 2023-11-20 06:56:52,845 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 06:56:58,164 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=985420.0, ans=15.0 2023-11-20 06:56:58,398 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.76 vs. limit=15.0 2023-11-20 06:57:00,453 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=985420.0, ans=0.1 2023-11-20 06:57:03,977 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=985420.0, ans=0.125 2023-11-20 06:57:18,620 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=985486.6666666666, ans=0.0 2023-11-20 06:57:24,521 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 3550, loss[loss=0.07746, simple_loss=0.08679, pruned_loss=0.02169, audio_tagging_loss=0.01237, over 15799.00 frames. ], tot_loss[loss=0.08062, simple_loss=0.1009, pruned_loss=0.02014, audio_tagging_loss=0.01004, over 3049587.12 frames. ], batch size: 62, lr: 5.36e-03, grad_scale: 16.0 2023-11-20 06:57:30,442 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=985553.3333333334, ans=0.0 2023-11-20 06:57:44,159 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 147850 2023-11-20 06:57:49,867 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.46 vs. limit=15.0 2023-11-20 06:57:50,534 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=985686.6666666666, ans=0.0 2023-11-20 06:57:50,598 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=985686.6666666666, ans=0.2 2023-11-20 06:57:52,013 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.80 vs. limit=15.0 2023-11-20 06:58:02,442 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=985753.3333333334, ans=0.125 2023-11-20 06:58:26,570 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=985820.0, ans=0.125 2023-11-20 06:58:29,799 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 3600, loss[loss=0.06087, simple_loss=0.073, pruned_loss=0.01337, audio_tagging_loss=0.01101, over 14851.00 frames. ], tot_loss[loss=0.08028, simple_loss=0.1004, pruned_loss=0.01999, audio_tagging_loss=0.01008, over 3038470.28 frames. ], batch size: 57, lr: 5.36e-03, grad_scale: 32.0 2023-11-20 06:58:41,268 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=985953.3333333334, ans=0.1 2023-11-20 06:58:44,589 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.072e+01 8.078e+01 8.812e+01 9.931e+01 2.962e+02, threshold=1.762e+02, percent-clipped=1.0 2023-11-20 06:58:48,357 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 147900 2023-11-20 06:59:33,133 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 3650, loss[loss=0.1055, simple_loss=0.1238, pruned_loss=0.03047, audio_tagging_loss=0.01315, over 15063.00 frames. ], tot_loss[loss=0.08025, simple_loss=0.1001, pruned_loss=0.02017, audio_tagging_loss=0.01003, over 3037420.12 frames. ], batch size: 54, lr: 5.36e-03, grad_scale: 32.0 2023-11-20 06:59:44,039 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 06:59:52,933 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 147950 2023-11-20 07:00:15,394 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.77 vs. limit=15.0 2023-11-20 07:00:21,434 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=986420.0, ans=0.1 2023-11-20 07:00:28,028 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=986486.6666666666, ans=0.2 2023-11-20 07:00:38,019 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 3700, loss[loss=0.06792, simple_loss=0.07676, pruned_loss=0.01907, audio_tagging_loss=0.01047, over 15960.00 frames. ], tot_loss[loss=0.08039, simple_loss=0.1003, pruned_loss=0.02018, audio_tagging_loss=0.01004, over 3044857.83 frames. ], batch size: 62, lr: 5.36e-03, grad_scale: 32.0 2023-11-20 07:00:53,835 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.585e+01 7.922e+01 8.709e+01 9.261e+01 1.390e+02, threshold=1.742e+02, percent-clipped=0.0 2023-11-20 07:00:57,665 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 148000 2023-11-20 07:00:59,155 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-148000.pt 2023-11-20 07:01:08,432 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.19 vs. limit=15.0 2023-11-20 07:01:22,053 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=986753.3333333334, ans=0.1 2023-11-20 07:01:39,113 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.08 vs. limit=22.5 2023-11-20 07:01:43,751 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=986820.0, ans=0.025 2023-11-20 07:01:47,078 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 3750, loss[loss=0.06729, simple_loss=0.08075, pruned_loss=0.01239, audio_tagging_loss=0.01453, over 15126.00 frames. ], tot_loss[loss=0.08015, simple_loss=0.1004, pruned_loss=0.01998, audio_tagging_loss=0.009985, over 3048307.68 frames. ], batch size: 57, lr: 5.36e-03, grad_scale: 32.0 2023-11-20 07:01:53,569 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=986886.6666666666, ans=0.0 2023-11-20 07:02:02,149 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=986953.3333333334, ans=0.125 2023-11-20 07:02:05,099 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.51 vs. limit=15.0 2023-11-20 07:02:05,723 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 148050 2023-11-20 07:02:15,424 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.80 vs. limit=15.0 2023-11-20 07:02:21,721 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=987020.0, ans=0.0 2023-11-20 07:02:21,735 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=987020.0, ans=0.125 2023-11-20 07:02:29,843 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=987086.6666666666, ans=0.125 2023-11-20 07:02:30,755 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 07:02:49,070 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=987153.3333333334, ans=0.0 2023-11-20 07:02:51,363 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 3800, loss[loss=0.08826, simple_loss=0.1065, pruned_loss=0.02342, audio_tagging_loss=0.0116, over 16216.00 frames. ], tot_loss[loss=0.08067, simple_loss=0.1009, pruned_loss=0.02015, audio_tagging_loss=0.0101, over 3047763.53 frames. ], batch size: 59, lr: 5.36e-03, grad_scale: 32.0 2023-11-20 07:03:04,456 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=987286.6666666666, ans=0.125 2023-11-20 07:03:07,208 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.518e+01 8.071e+01 8.832e+01 9.688e+01 1.208e+02, threshold=1.766e+02, percent-clipped=0.0 2023-11-20 07:03:10,947 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 148100 2023-11-20 07:03:22,639 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=987353.3333333334, ans=0.0 2023-11-20 07:03:55,676 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 3850, loss[loss=0.07741, simple_loss=0.09763, pruned_loss=0.01725, audio_tagging_loss=0.01134, over 15887.00 frames. ], tot_loss[loss=0.08056, simple_loss=0.1009, pruned_loss=0.02001, audio_tagging_loss=0.01013, over 3046126.76 frames. ], batch size: 60, lr: 5.35e-03, grad_scale: 32.0 2023-11-20 07:03:59,720 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=987553.3333333334, ans=0.0 2023-11-20 07:04:03,260 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=987553.3333333334, ans=0.125 2023-11-20 07:04:08,724 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=987620.0, ans=0.1 2023-11-20 07:04:09,944 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=987620.0, ans=0.0 2023-11-20 07:04:15,173 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 148150 2023-11-20 07:04:21,611 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=987686.6666666666, ans=0.5 2023-11-20 07:04:24,134 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=987686.6666666666, ans=0.1 2023-11-20 07:04:28,776 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=987686.6666666666, ans=0.1 2023-11-20 07:04:52,653 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.63 vs. limit=15.0 2023-11-20 07:05:00,259 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 3900, loss[loss=0.09215, simple_loss=0.1149, pruned_loss=0.02435, audio_tagging_loss=0.01034, over 16705.00 frames. ], tot_loss[loss=0.08089, simple_loss=0.1013, pruned_loss=0.02007, audio_tagging_loss=0.01016, over 3042713.22 frames. ], batch size: 59, lr: 5.35e-03, grad_scale: 32.0 2023-11-20 07:05:15,605 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.880e+01 8.310e+01 9.082e+01 9.917e+01 1.355e+02, threshold=1.816e+02, percent-clipped=0.0 2023-11-20 07:05:19,477 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 148200 2023-11-20 07:05:26,663 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=988020.0, ans=0.2 2023-11-20 07:05:43,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=988086.6666666666, ans=0.125 2023-11-20 07:05:46,598 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=988086.6666666666, ans=0.0 2023-11-20 07:05:58,180 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=988153.3333333334, ans=0.125 2023-11-20 07:06:01,829 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=988153.3333333334, ans=0.0 2023-11-20 07:06:05,386 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 3950, loss[loss=0.08387, simple_loss=0.1048, pruned_loss=0.01798, audio_tagging_loss=0.01349, over 14825.00 frames. ], tot_loss[loss=0.08203, simple_loss=0.1028, pruned_loss=0.02041, audio_tagging_loss=0.01023, over 3041920.88 frames. ], batch size: 55, lr: 5.35e-03, grad_scale: 32.0 2023-11-20 07:06:25,456 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 148250 2023-11-20 07:06:33,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=988353.3333333334, ans=0.2 2023-11-20 07:06:45,094 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=988420.0, ans=0.125 2023-11-20 07:06:52,496 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.15 vs. limit=15.0 2023-11-20 07:06:57,032 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=988486.6666666666, ans=0.125 2023-11-20 07:07:10,816 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 4000, loss[loss=0.09268, simple_loss=0.1215, pruned_loss=0.02179, audio_tagging_loss=0.01017, over 15528.00 frames. ], tot_loss[loss=0.08307, simple_loss=0.1038, pruned_loss=0.02086, audio_tagging_loss=0.01029, over 3043878.44 frames. ], batch size: 55, lr: 5.35e-03, grad_scale: 32.0 2023-11-20 07:07:11,108 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=988553.3333333334, ans=0.125 2023-11-20 07:07:26,771 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.067e+01 8.419e+01 9.137e+01 9.996e+01 1.272e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-20 07:07:30,583 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 148300 2023-11-20 07:07:38,989 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=988686.6666666666, ans=10.0 2023-11-20 07:07:45,447 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.22 vs. limit=6.0 2023-11-20 07:07:46,453 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=988686.6666666666, ans=0.125 2023-11-20 07:07:58,101 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=988753.3333333334, ans=0.125 2023-11-20 07:08:09,822 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=988820.0, ans=0.0 2023-11-20 07:08:12,604 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.72 vs. limit=15.0 2023-11-20 07:08:16,233 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 4050, loss[loss=0.07259, simple_loss=0.09369, pruned_loss=0.01561, audio_tagging_loss=0.01013, over 15328.00 frames. ], tot_loss[loss=0.08224, simple_loss=0.1029, pruned_loss=0.02042, audio_tagging_loss=0.01035, over 3029432.00 frames. ], batch size: 56, lr: 5.35e-03, grad_scale: 32.0 2023-11-20 07:08:18,716 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 07:08:27,276 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=988953.3333333334, ans=0.125 2023-11-20 07:08:34,938 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 148350 2023-11-20 07:08:37,619 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=988953.3333333334, ans=0.2 2023-11-20 07:08:50,701 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.35 vs. limit=15.0 2023-11-20 07:09:20,351 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 4100, loss[loss=0.07556, simple_loss=0.09401, pruned_loss=0.01479, audio_tagging_loss=0.01376, over 16606.00 frames. ], tot_loss[loss=0.0817, simple_loss=0.1021, pruned_loss=0.0203, audio_tagging_loss=0.01034, over 3031839.64 frames. ], batch size: 62, lr: 5.35e-03, grad_scale: 16.0 2023-11-20 07:09:30,575 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.80 vs. limit=22.5 2023-11-20 07:09:37,910 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.747e+01 7.925e+01 8.541e+01 9.436e+01 1.128e+02, threshold=1.708e+02, percent-clipped=0.0 2023-11-20 07:09:39,251 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 148400 2023-11-20 07:10:24,269 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 4150, loss[loss=0.0868, simple_loss=0.1156, pruned_loss=0.02011, audio_tagging_loss=0.008891, over 14847.00 frames. ], tot_loss[loss=0.08181, simple_loss=0.1024, pruned_loss=0.02039, audio_tagging_loss=0.01022, over 3033921.16 frames. ], batch size: 56, lr: 5.35e-03, grad_scale: 8.0 2023-11-20 07:10:41,974 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=989620.0, ans=0.125 2023-11-20 07:10:44,157 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 148450 2023-11-20 07:11:09,829 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 07:11:28,776 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 4200, loss[loss=0.06086, simple_loss=0.07024, pruned_loss=0.01474, audio_tagging_loss=0.011, over 14631.00 frames. ], tot_loss[loss=0.08209, simple_loss=0.1029, pruned_loss=0.02055, audio_tagging_loss=0.0101, over 3035719.53 frames. ], batch size: 56, lr: 5.35e-03, grad_scale: 8.0 2023-11-20 07:11:46,393 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.989e+01 8.148e+01 8.675e+01 9.910e+01 1.292e+02, threshold=1.735e+02, percent-clipped=0.0 2023-11-20 07:11:47,752 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 148500 2023-11-20 07:12:06,981 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.39 vs. limit=15.0 2023-11-20 07:12:09,592 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=990086.6666666666, ans=0.125 2023-11-20 07:12:12,514 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=990086.6666666666, ans=15.0 2023-11-20 07:12:21,446 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=990153.3333333334, ans=0.1 2023-11-20 07:12:32,958 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 4250, loss[loss=0.07843, simple_loss=0.09242, pruned_loss=0.02289, audio_tagging_loss=0.009326, over 15332.00 frames. ], tot_loss[loss=0.08199, simple_loss=0.1031, pruned_loss=0.02045, audio_tagging_loss=0.009975, over 3040059.62 frames. ], batch size: 57, lr: 5.35e-03, grad_scale: 8.0 2023-11-20 07:12:39,277 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=990220.0, ans=0.125 2023-11-20 07:12:46,285 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=990286.6666666666, ans=0.2 2023-11-20 07:12:52,627 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 148550 2023-11-20 07:13:08,986 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=990353.3333333334, ans=0.125 2023-11-20 07:13:26,368 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=990486.6666666666, ans=0.125 2023-11-20 07:13:38,286 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 4300, loss[loss=0.1004, simple_loss=0.1254, pruned_loss=0.02868, audio_tagging_loss=0.009002, over 14674.00 frames. ], tot_loss[loss=0.08191, simple_loss=0.103, pruned_loss=0.02055, audio_tagging_loss=0.009889, over 3043229.47 frames. ], batch size: 56, lr: 5.35e-03, grad_scale: 8.0 2023-11-20 07:13:48,786 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 07:13:56,610 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.158e+01 8.001e+01 8.705e+01 9.608e+01 1.261e+02, threshold=1.741e+02, percent-clipped=0.0 2023-11-20 07:13:57,065 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=990620.0, ans=0.0 2023-11-20 07:13:57,920 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 148600 2023-11-20 07:14:11,999 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=990686.6666666666, ans=0.125 2023-11-20 07:14:18,129 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=990753.3333333334, ans=0.125 2023-11-20 07:14:37,427 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=990820.0, ans=0.0 2023-11-20 07:14:39,685 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=990820.0, ans=0.1 2023-11-20 07:14:43,235 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 4350, loss[loss=0.05045, simple_loss=0.05823, pruned_loss=0.01039, audio_tagging_loss=0.01095, over 15141.00 frames. ], tot_loss[loss=0.08155, simple_loss=0.1024, pruned_loss=0.02042, audio_tagging_loss=0.009917, over 3036047.86 frames. ], batch size: 57, lr: 5.35e-03, grad_scale: 8.0 2023-11-20 07:14:43,844 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.57 vs. limit=15.0 2023-11-20 07:14:44,700 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 07:15:02,458 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 148650 2023-11-20 07:15:07,521 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=991020.0, ans=0.2 2023-11-20 07:15:22,492 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=991086.6666666666, ans=0.0 2023-11-20 07:15:37,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=991153.3333333334, ans=0.125 2023-11-20 07:15:38,485 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=991153.3333333334, ans=0.125 2023-11-20 07:15:45,775 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=991153.3333333334, ans=0.125 2023-11-20 07:15:48,010 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 4400, loss[loss=0.1012, simple_loss=0.1321, pruned_loss=0.02904, audio_tagging_loss=0.006102, over 14533.00 frames. ], tot_loss[loss=0.0816, simple_loss=0.1026, pruned_loss=0.02052, audio_tagging_loss=0.009776, over 3029901.18 frames. ], batch size: 54, lr: 5.34e-03, grad_scale: 16.0 2023-11-20 07:15:56,008 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.72 vs. limit=15.0 2023-11-20 07:16:05,879 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.660e+01 8.299e+01 8.915e+01 9.764e+01 1.212e+02, threshold=1.783e+02, percent-clipped=0.0 2023-11-20 07:16:07,889 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 148700 2023-11-20 07:16:17,617 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=991353.3333333334, ans=0.5 2023-11-20 07:16:40,589 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.27 vs. limit=22.5 2023-11-20 07:16:52,172 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 4450, loss[loss=0.1006, simple_loss=0.1154, pruned_loss=0.03448, audio_tagging_loss=0.008389, over 14765.00 frames. ], tot_loss[loss=0.08108, simple_loss=0.1015, pruned_loss=0.02043, audio_tagging_loss=0.009914, over 3040488.84 frames. ], batch size: 55, lr: 5.34e-03, grad_scale: 16.0 2023-11-20 07:17:02,809 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.44 vs. limit=10.0 2023-11-20 07:17:08,863 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=991620.0, ans=0.04949747468305833 2023-11-20 07:17:12,430 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 148750 2023-11-20 07:17:44,678 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=991820.0, ans=0.125 2023-11-20 07:17:44,776 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=991820.0, ans=0.2 2023-11-20 07:17:51,643 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=991820.0, ans=0.125 2023-11-20 07:17:55,954 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=991820.0, ans=0.125 2023-11-20 07:17:58,012 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 4500, loss[loss=0.07488, simple_loss=0.09556, pruned_loss=0.01488, audio_tagging_loss=0.01221, over 15112.00 frames. ], tot_loss[loss=0.08115, simple_loss=0.102, pruned_loss=0.02033, audio_tagging_loss=0.009832, over 3046526.61 frames. ], batch size: 56, lr: 5.34e-03, grad_scale: 16.0 2023-11-20 07:18:03,841 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=991886.6666666666, ans=0.0 2023-11-20 07:18:10,020 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=991953.3333333334, ans=0.125 2023-11-20 07:18:14,820 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=991953.3333333334, ans=0.2 2023-11-20 07:18:15,709 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.852e+01 8.390e+01 8.916e+01 9.990e+01 1.363e+02, threshold=1.783e+02, percent-clipped=0.0 2023-11-20 07:18:17,037 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 148800 2023-11-20 07:18:32,332 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=992020.0, ans=0.0 2023-11-20 07:18:42,203 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=992086.6666666666, ans=0.125 2023-11-20 07:19:02,237 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.39 vs. limit=10.0 2023-11-20 07:19:02,811 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 4550, loss[loss=0.06564, simple_loss=0.0764, pruned_loss=0.01387, audio_tagging_loss=0.01356, over 14812.00 frames. ], tot_loss[loss=0.08001, simple_loss=0.1003, pruned_loss=0.01991, audio_tagging_loss=0.009925, over 3035240.32 frames. ], batch size: 56, lr: 5.34e-03, grad_scale: 16.0 2023-11-20 07:19:04,756 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.60 vs. limit=15.0 2023-11-20 07:19:07,186 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=5.96 vs. limit=15.0 2023-11-20 07:19:15,507 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=992286.6666666666, ans=0.125 2023-11-20 07:19:19,124 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=992286.6666666666, ans=0.0 2023-11-20 07:19:21,522 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 148850 2023-11-20 07:19:34,162 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=992353.3333333334, ans=0.1 2023-11-20 07:19:45,823 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=992420.0, ans=0.1 2023-11-20 07:19:51,861 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 07:19:55,884 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=992486.6666666666, ans=0.2 2023-11-20 07:20:00,965 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=992486.6666666666, ans=0.1 2023-11-20 07:20:06,661 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 4600, loss[loss=0.09052, simple_loss=0.102, pruned_loss=0.02286, audio_tagging_loss=0.01668, over 14782.00 frames. ], tot_loss[loss=0.08062, simple_loss=0.1013, pruned_loss=0.02008, audio_tagging_loss=0.009884, over 3039676.78 frames. ], batch size: 55, lr: 5.34e-03, grad_scale: 16.0 2023-11-20 07:20:14,912 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=992553.3333333334, ans=0.1 2023-11-20 07:20:25,853 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.427e+01 8.263e+01 8.928e+01 9.927e+01 1.394e+02, threshold=1.786e+02, percent-clipped=0.0 2023-11-20 07:20:27,170 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 148900 2023-11-20 07:21:11,585 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 4650, loss[loss=0.08555, simple_loss=0.1121, pruned_loss=0.0183, audio_tagging_loss=0.01121, over 15281.00 frames. ], tot_loss[loss=0.0808, simple_loss=0.1014, pruned_loss=0.0201, audio_tagging_loss=0.009986, over 3049455.60 frames. ], batch size: 60, lr: 5.34e-03, grad_scale: 16.0 2023-11-20 07:21:14,979 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=992886.6666666666, ans=0.0 2023-11-20 07:21:22,855 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=992886.6666666666, ans=0.0 2023-11-20 07:21:31,213 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 148950 2023-11-20 07:22:12,073 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=993153.3333333334, ans=0.125 2023-11-20 07:22:16,904 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 4700, loss[loss=0.07726, simple_loss=0.09697, pruned_loss=0.01974, audio_tagging_loss=0.009028, over 15091.00 frames. ], tot_loss[loss=0.08179, simple_loss=0.1027, pruned_loss=0.02045, audio_tagging_loss=0.00998, over 3052344.98 frames. ], batch size: 55, lr: 5.34e-03, grad_scale: 16.0 2023-11-20 07:22:34,027 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.986e+01 8.015e+01 8.729e+01 9.415e+01 1.258e+02, threshold=1.746e+02, percent-clipped=0.0 2023-11-20 07:22:35,418 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 149000 2023-11-20 07:22:41,517 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=993353.3333333334, ans=0.1 2023-11-20 07:22:49,659 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=993353.3333333334, ans=0.0 2023-11-20 07:22:53,370 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=993353.3333333334, ans=0.0 2023-11-20 07:23:09,890 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.71 vs. limit=12.0 2023-11-20 07:23:21,489 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 4750, loss[loss=0.0939, simple_loss=0.1136, pruned_loss=0.0267, audio_tagging_loss=0.01039, over 15301.00 frames. ], tot_loss[loss=0.08114, simple_loss=0.1014, pruned_loss=0.02027, audio_tagging_loss=0.01016, over 3056684.63 frames. ], batch size: 54, lr: 5.34e-03, grad_scale: 16.0 2023-11-20 07:23:32,292 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=993553.3333333334, ans=0.0 2023-11-20 07:23:32,578 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.82 vs. limit=22.5 2023-11-20 07:23:38,868 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=993620.0, ans=0.125 2023-11-20 07:23:41,209 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 149050 2023-11-20 07:23:51,928 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=993686.6666666666, ans=0.1 2023-11-20 07:24:13,889 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=993820.0, ans=0.125 2023-11-20 07:24:15,250 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=993820.0, ans=0.1 2023-11-20 07:24:15,708 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.41 vs. limit=15.0 2023-11-20 07:24:25,451 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 4800, loss[loss=0.0904, simple_loss=0.1152, pruned_loss=0.02131, audio_tagging_loss=0.0115, over 16694.00 frames. ], tot_loss[loss=0.08101, simple_loss=0.1013, pruned_loss=0.02008, audio_tagging_loss=0.01027, over 3057857.42 frames. ], batch size: 62, lr: 5.34e-03, grad_scale: 32.0 2023-11-20 07:24:33,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=993886.6666666666, ans=0.2 2023-11-20 07:24:44,658 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.552e+01 8.333e+01 8.832e+01 9.618e+01 1.335e+02, threshold=1.766e+02, percent-clipped=0.0 2023-11-20 07:24:46,056 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 149100 2023-11-20 07:24:53,854 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=17.94 vs. limit=15.0 2023-11-20 07:25:31,590 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 4850, loss[loss=0.08792, simple_loss=0.1126, pruned_loss=0.02181, audio_tagging_loss=0.009815, over 15613.00 frames. ], tot_loss[loss=0.08164, simple_loss=0.102, pruned_loss=0.02032, audio_tagging_loss=0.01034, over 3052697.98 frames. ], batch size: 57, lr: 5.34e-03, grad_scale: 32.0 2023-11-20 07:25:36,695 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=994220.0, ans=0.0 2023-11-20 07:25:45,104 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=994286.6666666666, ans=0.125 2023-11-20 07:25:48,723 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=994286.6666666666, ans=0.125 2023-11-20 07:25:49,750 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 149150 2023-11-20 07:25:59,191 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.56 vs. limit=15.0 2023-11-20 07:26:09,210 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=994420.0, ans=0.125 2023-11-20 07:26:22,858 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=994486.6666666666, ans=0.2 2023-11-20 07:26:22,991 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=994486.6666666666, ans=0.0 2023-11-20 07:26:30,363 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=994486.6666666666, ans=0.0 2023-11-20 07:26:34,936 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 4900, loss[loss=0.06261, simple_loss=0.08114, pruned_loss=0.01483, audio_tagging_loss=0.007205, over 15285.00 frames. ], tot_loss[loss=0.08161, simple_loss=0.102, pruned_loss=0.02031, audio_tagging_loss=0.01031, over 3044609.35 frames. ], batch size: 60, lr: 5.34e-03, grad_scale: 32.0 2023-11-20 07:26:38,986 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=994553.3333333334, ans=0.0 2023-11-20 07:26:47,995 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=994620.0, ans=0.125 2023-11-20 07:26:52,654 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.754e+01 8.440e+01 9.252e+01 1.012e+02 1.571e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-20 07:26:54,705 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 149200 2023-11-20 07:27:00,398 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.61 vs. limit=15.0 2023-11-20 07:27:13,646 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=994753.3333333334, ans=0.07 2023-11-20 07:27:35,659 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.19 vs. limit=10.0 2023-11-20 07:27:39,751 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 4950, loss[loss=0.08185, simple_loss=0.109, pruned_loss=0.01781, audio_tagging_loss=0.009537, over 14603.00 frames. ], tot_loss[loss=0.08138, simple_loss=0.1021, pruned_loss=0.02023, audio_tagging_loss=0.01009, over 3047539.70 frames. ], batch size: 56, lr: 5.33e-03, grad_scale: 32.0 2023-11-20 07:27:49,388 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=994886.6666666666, ans=0.2 2023-11-20 07:27:59,574 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 149250 2023-11-20 07:28:11,662 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=995020.0, ans=0.125 2023-11-20 07:28:18,417 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.90 vs. limit=8.0 2023-11-20 07:28:36,135 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=995153.3333333334, ans=0.125 2023-11-20 07:28:36,275 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=995153.3333333334, ans=0.2 2023-11-20 07:28:39,996 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=995153.3333333334, ans=0.2 2023-11-20 07:28:45,201 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 5000, loss[loss=0.07854, simple_loss=0.1034, pruned_loss=0.01854, audio_tagging_loss=0.0083, over 15537.00 frames. ], tot_loss[loss=0.08079, simple_loss=0.1018, pruned_loss=0.01998, audio_tagging_loss=0.009928, over 3042925.00 frames. ], batch size: 56, lr: 5.33e-03, grad_scale: 16.0 2023-11-20 07:28:53,725 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=995220.0, ans=0.125 2023-11-20 07:28:54,621 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=995220.0, ans=0.0 2023-11-20 07:28:54,762 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=995220.0, ans=0.125 2023-11-20 07:29:02,082 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 07:29:04,094 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.403e+01 7.937e+01 8.592e+01 9.230e+01 1.200e+02, threshold=1.718e+02, percent-clipped=0.0 2023-11-20 07:29:04,237 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 149300 2023-11-20 07:29:12,604 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=995353.3333333334, ans=0.2 2023-11-20 07:29:14,326 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.38 vs. limit=15.0 2023-11-20 07:29:16,402 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=995353.3333333334, ans=0.125 2023-11-20 07:29:32,545 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.92 vs. limit=15.0 2023-11-20 07:29:37,087 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.90 vs. limit=15.0 2023-11-20 07:29:49,397 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 5050, loss[loss=0.09247, simple_loss=0.1187, pruned_loss=0.02044, audio_tagging_loss=0.01269, over 16069.00 frames. ], tot_loss[loss=0.08103, simple_loss=0.1022, pruned_loss=0.02009, audio_tagging_loss=0.009833, over 3047319.16 frames. ], batch size: 59, lr: 5.33e-03, grad_scale: 16.0 2023-11-20 07:29:53,228 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=995553.3333333334, ans=0.125 2023-11-20 07:30:06,196 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 07:30:08,501 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 149350 2023-11-20 07:30:09,209 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.07 vs. limit=15.0 2023-11-20 07:30:30,777 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=995753.3333333334, ans=0.125 2023-11-20 07:30:31,023 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=995753.3333333334, ans=0.0 2023-11-20 07:30:40,564 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.49 vs. limit=15.0 2023-11-20 07:30:44,030 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=995820.0, ans=0.1 2023-11-20 07:30:54,242 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 5100, loss[loss=0.06662, simple_loss=0.08654, pruned_loss=0.01532, audio_tagging_loss=0.008038, over 15807.00 frames. ], tot_loss[loss=0.08095, simple_loss=0.102, pruned_loss=0.02011, audio_tagging_loss=0.009857, over 3047693.98 frames. ], batch size: 59, lr: 5.33e-03, grad_scale: 16.0 2023-11-20 07:31:13,984 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.044e+01 8.152e+01 8.992e+01 9.912e+01 1.301e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-20 07:31:14,152 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 149400 2023-11-20 07:31:15,921 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=995953.3333333334, ans=0.125 2023-11-20 07:31:15,945 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=995953.3333333334, ans=0.1 2023-11-20 07:31:33,938 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=996086.6666666666, ans=0.0 2023-11-20 07:31:45,721 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=996153.3333333334, ans=0.0 2023-11-20 07:31:49,953 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.32 vs. limit=5.0 2023-11-20 07:31:50,624 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=996153.3333333334, ans=0.125 2023-11-20 07:31:59,923 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 5150, loss[loss=0.05702, simple_loss=0.07098, pruned_loss=0.0125, audio_tagging_loss=0.009032, over 15736.00 frames. ], tot_loss[loss=0.08079, simple_loss=0.1019, pruned_loss=0.02005, audio_tagging_loss=0.009802, over 3059021.52 frames. ], batch size: 57, lr: 5.33e-03, grad_scale: 16.0 2023-11-20 07:32:03,873 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=996220.0, ans=0.0 2023-11-20 07:32:18,914 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 149450 2023-11-20 07:32:21,573 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=996286.6666666666, ans=0.2 2023-11-20 07:32:38,289 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=996420.0, ans=0.125 2023-11-20 07:32:42,889 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.58 vs. limit=15.0 2023-11-20 07:32:47,320 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=996420.0, ans=0.125 2023-11-20 07:33:03,246 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 5200, loss[loss=0.1088, simple_loss=0.1353, pruned_loss=0.03456, audio_tagging_loss=0.006606, over 15913.00 frames. ], tot_loss[loss=0.08131, simple_loss=0.1023, pruned_loss=0.02026, audio_tagging_loss=0.009884, over 3053527.58 frames. ], batch size: 58, lr: 5.33e-03, grad_scale: 32.0 2023-11-20 07:33:05,377 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=996553.3333333334, ans=0.2 2023-11-20 07:33:12,672 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=996553.3333333334, ans=0.2 2023-11-20 07:33:22,731 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.044e+01 8.127e+01 8.748e+01 9.361e+01 1.233e+02, threshold=1.750e+02, percent-clipped=0.0 2023-11-20 07:33:22,896 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 149500 2023-11-20 07:33:23,048 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=996620.0, ans=0.0 2023-11-20 07:33:51,304 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=996753.3333333334, ans=0.1 2023-11-20 07:34:07,806 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 5250, loss[loss=0.06868, simple_loss=0.07808, pruned_loss=0.0181, audio_tagging_loss=0.01154, over 13926.00 frames. ], tot_loss[loss=0.08111, simple_loss=0.1024, pruned_loss=0.02013, audio_tagging_loss=0.009792, over 3047028.93 frames. ], batch size: 57, lr: 5.33e-03, grad_scale: 16.0 2023-11-20 07:34:28,157 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 149550 2023-11-20 07:34:33,304 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=997020.0, ans=0.125 2023-11-20 07:34:39,379 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=997020.0, ans=0.125 2023-11-20 07:34:53,547 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=997086.6666666666, ans=0.1 2023-11-20 07:35:00,672 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.83 vs. limit=15.0 2023-11-20 07:35:08,447 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=997153.3333333334, ans=0.0 2023-11-20 07:35:13,105 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 5300, loss[loss=0.07695, simple_loss=0.09026, pruned_loss=0.01893, audio_tagging_loss=0.0129, over 14610.00 frames. ], tot_loss[loss=0.08127, simple_loss=0.1026, pruned_loss=0.02017, audio_tagging_loss=0.009826, over 3043437.69 frames. ], batch size: 58, lr: 5.33e-03, grad_scale: 16.0 2023-11-20 07:35:13,475 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=997220.0, ans=0.0 2023-11-20 07:35:16,288 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.32 vs. limit=10.0 2023-11-20 07:35:26,409 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=997286.6666666666, ans=0.125 2023-11-20 07:35:32,178 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 149600 2023-11-20 07:35:32,319 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=997286.6666666666, ans=0.125 2023-11-20 07:35:33,242 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.168e+01 7.882e+01 8.704e+01 9.363e+01 1.429e+02, threshold=1.741e+02, percent-clipped=0.0 2023-11-20 07:35:54,007 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=997420.0, ans=0.125 2023-11-20 07:36:18,203 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 5350, loss[loss=0.09147, simple_loss=0.112, pruned_loss=0.0275, audio_tagging_loss=0.007947, over 14289.00 frames. ], tot_loss[loss=0.08129, simple_loss=0.1025, pruned_loss=0.02023, audio_tagging_loss=0.009833, over 3040143.32 frames. ], batch size: 53, lr: 5.33e-03, grad_scale: 16.0 2023-11-20 07:36:20,292 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.45 vs. limit=12.0 2023-11-20 07:36:37,666 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 149650 2023-11-20 07:36:43,029 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=997686.6666666666, ans=0.1 2023-11-20 07:36:44,319 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=997686.6666666666, ans=22.5 2023-11-20 07:36:53,693 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.43 vs. limit=10.0 2023-11-20 07:37:22,384 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 5400, loss[loss=0.113, simple_loss=0.1421, pruned_loss=0.03498, audio_tagging_loss=0.006948, over 15618.00 frames. ], tot_loss[loss=0.08155, simple_loss=0.1027, pruned_loss=0.02036, audio_tagging_loss=0.009834, over 3037022.16 frames. ], batch size: 56, lr: 5.33e-03, grad_scale: 16.0 2023-11-20 07:37:39,583 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=997953.3333333334, ans=0.2 2023-11-20 07:37:41,838 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 149700 2023-11-20 07:37:42,894 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.237e+01 8.118e+01 8.667e+01 9.522e+01 1.475e+02, threshold=1.733e+02, percent-clipped=0.0 2023-11-20 07:37:46,538 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.44 vs. limit=15.0 2023-11-20 07:37:55,223 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.87 vs. limit=6.0 2023-11-20 07:38:03,311 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 07:38:26,846 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 5450, loss[loss=0.08782, simple_loss=0.1055, pruned_loss=0.02368, audio_tagging_loss=0.01137, over 16072.00 frames. ], tot_loss[loss=0.0813, simple_loss=0.1021, pruned_loss=0.02031, audio_tagging_loss=0.009953, over 3036960.68 frames. ], batch size: 60, lr: 5.33e-03, grad_scale: 8.0 2023-11-20 07:38:38,880 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=998286.6666666666, ans=0.0 2023-11-20 07:38:41,393 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=998286.6666666666, ans=0.1 2023-11-20 07:38:43,719 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_na.min_abs, batch_count=998286.6666666666, ans=0.02 2023-11-20 07:38:44,918 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=998286.6666666666, ans=0.125 2023-11-20 07:38:45,961 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 149750 2023-11-20 07:38:47,393 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=998286.6666666666, ans=0.125 2023-11-20 07:38:53,727 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.70 vs. limit=22.5 2023-11-20 07:38:54,788 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.88 vs. limit=15.0 2023-11-20 07:39:20,759 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=998486.6666666666, ans=0.2 2023-11-20 07:39:30,988 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 5500, loss[loss=0.09558, simple_loss=0.118, pruned_loss=0.0285, audio_tagging_loss=0.008079, over 15966.00 frames. ], tot_loss[loss=0.08141, simple_loss=0.1021, pruned_loss=0.02032, audio_tagging_loss=0.01003, over 3038383.88 frames. ], batch size: 57, lr: 5.32e-03, grad_scale: 8.0 2023-11-20 07:39:32,484 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=998553.3333333334, ans=0.0 2023-11-20 07:39:33,819 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=998553.3333333334, ans=0.1 2023-11-20 07:39:48,268 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=998620.0, ans=0.125 2023-11-20 07:39:49,343 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 149800 2023-11-20 07:39:49,910 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.98 vs. limit=15.0 2023-11-20 07:39:52,070 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.523e+01 8.049e+01 8.675e+01 9.506e+01 1.252e+02, threshold=1.735e+02, percent-clipped=0.0 2023-11-20 07:39:54,445 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=998620.0, ans=0.0 2023-11-20 07:39:57,862 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=998686.6666666666, ans=0.1 2023-11-20 07:40:00,243 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=998686.6666666666, ans=0.125 2023-11-20 07:40:02,640 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.32 vs. limit=15.0 2023-11-20 07:40:22,510 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=998820.0, ans=0.125 2023-11-20 07:40:26,163 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=998820.0, ans=0.0 2023-11-20 07:40:34,394 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 5550, loss[loss=0.1007, simple_loss=0.1281, pruned_loss=0.02484, audio_tagging_loss=0.01182, over 14513.00 frames. ], tot_loss[loss=0.08168, simple_loss=0.1023, pruned_loss=0.02039, audio_tagging_loss=0.01013, over 3041014.85 frames. ], batch size: 53, lr: 5.32e-03, grad_scale: 8.0 2023-11-20 07:40:52,576 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.96 vs. limit=22.5 2023-11-20 07:40:54,565 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 149850 2023-11-20 07:40:59,649 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=999020.0, ans=0.1 2023-11-20 07:41:07,663 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=999020.0, ans=0.0 2023-11-20 07:41:11,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=999020.0, ans=0.0 2023-11-20 07:41:25,221 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=999153.3333333334, ans=0.0 2023-11-20 07:41:28,814 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=999153.3333333334, ans=0.125 2023-11-20 07:41:29,882 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff2.min_abs, batch_count=999153.3333333334, ans=0.1 2023-11-20 07:41:40,122 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 5600, loss[loss=0.0837, simple_loss=0.101, pruned_loss=0.02342, audio_tagging_loss=0.009809, over 16544.00 frames. ], tot_loss[loss=0.08073, simple_loss=0.1008, pruned_loss=0.02, audio_tagging_loss=0.01031, over 3039608.95 frames. ], batch size: 61, lr: 5.32e-03, grad_scale: 16.0 2023-11-20 07:41:42,910 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=999220.0, ans=0.125 2023-11-20 07:41:59,462 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 149900 2023-11-20 07:42:01,823 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.809e+01 8.057e+01 8.712e+01 9.430e+01 1.236e+02, threshold=1.742e+02, percent-clipped=0.0 2023-11-20 07:42:04,693 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=999353.3333333334, ans=0.125 2023-11-20 07:42:19,151 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=999420.0, ans=0.1 2023-11-20 07:42:24,350 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 07:42:30,766 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=999486.6666666666, ans=0.125 2023-11-20 07:42:32,178 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=999486.6666666666, ans=15.0 2023-11-20 07:42:44,617 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 5650, loss[loss=0.05737, simple_loss=0.06723, pruned_loss=0.01164, audio_tagging_loss=0.01212, over 16495.00 frames. ], tot_loss[loss=0.08109, simple_loss=0.1013, pruned_loss=0.02012, audio_tagging_loss=0.0103, over 3045091.57 frames. ], batch size: 62, lr: 5.32e-03, grad_scale: 16.0 2023-11-20 07:42:53,645 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=999553.3333333334, ans=0.125 2023-11-20 07:42:54,149 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.61 vs. limit=15.0 2023-11-20 07:42:55,014 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.20 vs. limit=15.0 2023-11-20 07:43:03,285 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 149950 2023-11-20 07:43:18,786 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=999686.6666666666, ans=0.125 2023-11-20 07:43:48,657 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 5700, loss[loss=0.08916, simple_loss=0.1027, pruned_loss=0.02386, audio_tagging_loss=0.01395, over 14557.00 frames. ], tot_loss[loss=0.08069, simple_loss=0.1005, pruned_loss=0.02003, audio_tagging_loss=0.01043, over 3046911.59 frames. ], batch size: 54, lr: 5.32e-03, grad_scale: 16.0 2023-11-20 07:44:03,639 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.22 vs. limit=15.0 2023-11-20 07:44:08,582 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 150000 2023-11-20 07:44:11,212 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.782e+01 7.942e+01 8.498e+01 9.233e+01 1.252e+02, threshold=1.700e+02, percent-clipped=0.0 2023-11-20 07:44:17,611 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1000020.0, ans=0.0 2023-11-20 07:44:20,829 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1000020.0, ans=0.5 2023-11-20 07:44:44,265 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1000153.3333333334, ans=0.125 2023-11-20 07:44:47,793 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1000153.3333333334, ans=0.125 2023-11-20 07:44:53,182 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 5750, loss[loss=0.1113, simple_loss=0.1166, pruned_loss=0.0408, audio_tagging_loss=0.01224, over 14827.00 frames. ], tot_loss[loss=0.08054, simple_loss=0.1003, pruned_loss=0.02011, audio_tagging_loss=0.01026, over 3040811.43 frames. ], batch size: 54, lr: 5.32e-03, grad_scale: 16.0 2023-11-20 07:45:13,416 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 150050 2023-11-20 07:45:17,316 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1000286.6666666666, ans=0.2 2023-11-20 07:45:19,666 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1000353.3333333334, ans=0.125 2023-11-20 07:45:25,802 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1000353.3333333334, ans=0.0 2023-11-20 07:45:33,031 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1000420.0, ans=0.125 2023-11-20 07:45:33,224 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1000420.0, ans=0.2 2023-11-20 07:45:55,703 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1000486.6666666666, ans=0.1 2023-11-20 07:45:57,261 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1000553.3333333334, ans=0.125 2023-11-20 07:45:58,341 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 5800, loss[loss=0.1003, simple_loss=0.1174, pruned_loss=0.03066, audio_tagging_loss=0.01091, over 15749.00 frames. ], tot_loss[loss=0.08036, simple_loss=0.09992, pruned_loss=0.02023, audio_tagging_loss=0.01017, over 3036274.06 frames. ], batch size: 58, lr: 5.32e-03, grad_scale: 16.0 2023-11-20 07:46:04,006 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.02 vs. limit=6.0 2023-11-20 07:46:08,564 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 07:46:09,646 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1000620.0, ans=0.0 2023-11-20 07:46:15,891 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1000620.0, ans=0.0 2023-11-20 07:46:16,864 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 150100 2023-11-20 07:46:18,618 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.69 vs. limit=10.0 2023-11-20 07:46:19,209 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.622e+01 8.139e+01 8.776e+01 9.398e+01 1.793e+02, threshold=1.755e+02, percent-clipped=1.0 2023-11-20 07:46:30,164 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1000686.6666666666, ans=0.125 2023-11-20 07:46:34,169 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1000686.6666666666, ans=0.1 2023-11-20 07:46:38,010 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1000753.3333333334, ans=0.125 2023-11-20 07:46:43,194 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1000753.3333333334, ans=0.125 2023-11-20 07:46:54,894 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1000820.0, ans=0.2 2023-11-20 07:47:01,814 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 5850, loss[loss=0.09793, simple_loss=0.1313, pruned_loss=0.02637, audio_tagging_loss=0.005927, over 15699.00 frames. ], tot_loss[loss=0.08124, simple_loss=0.1013, pruned_loss=0.02061, audio_tagging_loss=0.009966, over 3038357.59 frames. ], batch size: 55, lr: 5.32e-03, grad_scale: 16.0 2023-11-20 07:47:06,866 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 07:47:16,071 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1000953.3333333334, ans=0.125 2023-11-20 07:47:20,813 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 150150 2023-11-20 07:47:22,945 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1000953.3333333334, ans=0.125 2023-11-20 07:47:46,414 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.49 vs. limit=15.0 2023-11-20 07:48:05,892 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 5900, loss[loss=0.09627, simple_loss=0.1259, pruned_loss=0.02395, audio_tagging_loss=0.009382, over 15063.00 frames. ], tot_loss[loss=0.08103, simple_loss=0.1014, pruned_loss=0.02045, audio_tagging_loss=0.009857, over 3037704.12 frames. ], batch size: 55, lr: 5.32e-03, grad_scale: 16.0 2023-11-20 07:48:12,888 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1001220.0, ans=0.125 2023-11-20 07:48:23,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1001286.6666666666, ans=0.0 2023-11-20 07:48:24,787 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1001286.6666666666, ans=0.1 2023-11-20 07:48:25,989 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 150200 2023-11-20 07:48:28,738 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.681e+01 8.240e+01 8.857e+01 9.692e+01 1.369e+02, threshold=1.771e+02, percent-clipped=0.0 2023-11-20 07:48:30,449 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1001286.6666666666, ans=0.125 2023-11-20 07:48:32,668 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1001353.3333333334, ans=0.125 2023-11-20 07:48:40,039 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1001353.3333333334, ans=0.1 2023-11-20 07:48:46,050 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1001420.0, ans=0.125 2023-11-20 07:49:03,240 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1001486.6666666666, ans=0.0 2023-11-20 07:49:09,165 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1001553.3333333334, ans=0.125 2023-11-20 07:49:10,180 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 5950, loss[loss=0.09205, simple_loss=0.1203, pruned_loss=0.02752, audio_tagging_loss=0.004389, over 14365.00 frames. ], tot_loss[loss=0.08089, simple_loss=0.1013, pruned_loss=0.02039, audio_tagging_loss=0.009841, over 3043461.89 frames. ], batch size: 55, lr: 5.32e-03, grad_scale: 16.0 2023-11-20 07:49:22,676 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1001620.0, ans=0.125 2023-11-20 07:49:25,156 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1001620.0, ans=0.2 2023-11-20 07:49:25,228 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1001620.0, ans=0.1 2023-11-20 07:49:29,852 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 150250 2023-11-20 07:49:40,308 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.01 vs. limit=6.0 2023-11-20 07:49:44,036 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1001686.6666666666, ans=0.5 2023-11-20 07:49:44,187 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1001686.6666666666, ans=0.125 2023-11-20 07:49:56,807 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1001753.3333333334, ans=0.1 2023-11-20 07:50:14,430 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 6000, loss[loss=0.08257, simple_loss=0.09877, pruned_loss=0.02205, audio_tagging_loss=0.01113, over 14817.00 frames. ], tot_loss[loss=0.08085, simple_loss=0.1016, pruned_loss=0.02024, audio_tagging_loss=0.009814, over 3046978.98 frames. ], batch size: 56, lr: 5.32e-03, grad_scale: 32.0 2023-11-20 07:50:14,434 INFO [train_asr.py:1285] (0/4) Computing validation loss 2023-11-20 07:50:55,339 INFO [train_asr.py:1294] (0/4) Epoch 13, validation: loss=0.06203, simple_loss=0.05394, pruned_loss=0.00581, audio_tagging_loss=0.02925, over 4681554.00 frames. 2023-11-20 07:50:55,340 INFO [train_asr.py:1295] (0/4) Maximum memory allocated so far is 25925MB 2023-11-20 07:51:10,475 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1001953.3333333334, ans=0.125 2023-11-20 07:51:15,058 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 150300 2023-11-20 07:51:17,454 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.658e+01 7.953e+01 8.643e+01 9.262e+01 1.038e+02, threshold=1.729e+02, percent-clipped=0.0 2023-11-20 07:51:17,813 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1001953.3333333334, ans=0.0 2023-11-20 07:51:24,125 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1002020.0, ans=0.05 2023-11-20 07:51:37,401 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1002086.6666666666, ans=0.125 2023-11-20 07:51:41,298 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 07:51:47,288 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1002153.3333333334, ans=0.0 2023-11-20 07:51:51,104 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1002153.3333333334, ans=0.2 2023-11-20 07:51:52,155 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1002153.3333333334, ans=0.125 2023-11-20 07:51:53,422 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1002153.3333333334, ans=0.125 2023-11-20 07:51:59,893 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 6050, loss[loss=0.09422, simple_loss=0.1226, pruned_loss=0.02512, audio_tagging_loss=0.007781, over 15732.00 frames. ], tot_loss[loss=0.08119, simple_loss=0.1019, pruned_loss=0.02042, audio_tagging_loss=0.009822, over 3049302.82 frames. ], batch size: 58, lr: 5.32e-03, grad_scale: 16.0 2023-11-20 07:52:01,325 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 07:52:03,029 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.45 vs. limit=6.0 2023-11-20 07:52:14,365 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1002286.6666666666, ans=0.125 2023-11-20 07:52:18,923 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 150350 2023-11-20 07:52:19,087 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1002286.6666666666, ans=0.125 2023-11-20 07:52:29,402 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1002353.3333333334, ans=0.0 2023-11-20 07:52:44,801 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1002420.0, ans=0.1 2023-11-20 07:53:00,602 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1002486.6666666666, ans=0.5 2023-11-20 07:53:03,909 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 6100, loss[loss=0.1194, simple_loss=0.1534, pruned_loss=0.03161, audio_tagging_loss=0.01107, over 16208.00 frames. ], tot_loss[loss=0.08097, simple_loss=0.1015, pruned_loss=0.02034, audio_tagging_loss=0.009882, over 3051224.20 frames. ], batch size: 53, lr: 5.31e-03, grad_scale: 16.0 2023-11-20 07:53:12,889 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1002553.3333333334, ans=0.0 2023-11-20 07:53:23,640 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 150400 2023-11-20 07:53:27,494 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.228e+01 8.482e+01 9.288e+01 1.055e+02 1.647e+02, threshold=1.858e+02, percent-clipped=0.0 2023-11-20 07:53:32,766 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1002686.6666666666, ans=0.125 2023-11-20 07:54:08,374 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 6150, loss[loss=0.06613, simple_loss=0.08582, pruned_loss=0.01316, audio_tagging_loss=0.01006, over 14931.00 frames. ], tot_loss[loss=0.08053, simple_loss=0.1008, pruned_loss=0.02023, audio_tagging_loss=0.009891, over 3055225.20 frames. ], batch size: 57, lr: 5.31e-03, grad_scale: 16.0 2023-11-20 07:54:26,715 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.55 vs. limit=10.0 2023-11-20 07:54:28,045 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 150450 2023-11-20 07:54:33,783 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.28 vs. limit=22.5 2023-11-20 07:54:34,303 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1003020.0, ans=0.0 2023-11-20 07:54:36,405 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1003020.0, ans=0.125 2023-11-20 07:54:42,486 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1003020.0, ans=0.0 2023-11-20 07:54:58,218 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.64 vs. limit=22.5 2023-11-20 07:55:05,768 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=5.215e-03 2023-11-20 07:55:12,837 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 6200, loss[loss=0.09247, simple_loss=0.1274, pruned_loss=0.02006, audio_tagging_loss=0.008703, over 14310.00 frames. ], tot_loss[loss=0.08038, simple_loss=0.1008, pruned_loss=0.02003, audio_tagging_loss=0.009958, over 3052653.52 frames. ], batch size: 53, lr: 5.31e-03, grad_scale: 16.0 2023-11-20 07:55:32,256 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 150500 2023-11-20 07:55:35,800 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.566e+01 8.052e+01 8.616e+01 9.415e+01 1.229e+02, threshold=1.723e+02, percent-clipped=0.0 2023-11-20 07:56:04,605 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.06 vs. limit=22.5 2023-11-20 07:56:13,567 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1003486.6666666666, ans=0.125 2023-11-20 07:56:17,650 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 6250, loss[loss=0.05026, simple_loss=0.05512, pruned_loss=0.01177, audio_tagging_loss=0.01093, over 14977.00 frames. ], tot_loss[loss=0.08064, simple_loss=0.101, pruned_loss=0.02005, audio_tagging_loss=0.0101, over 3054742.45 frames. ], batch size: 60, lr: 5.31e-03, grad_scale: 16.0 2023-11-20 07:56:37,319 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 150550 2023-11-20 07:56:56,654 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1003753.3333333334, ans=0.1 2023-11-20 07:57:21,464 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 6300, loss[loss=0.081, simple_loss=0.09587, pruned_loss=0.02093, audio_tagging_loss=0.01214, over 15105.00 frames. ], tot_loss[loss=0.08034, simple_loss=0.1006, pruned_loss=0.01986, audio_tagging_loss=0.01018, over 3052537.64 frames. ], batch size: 59, lr: 5.31e-03, grad_scale: 16.0 2023-11-20 07:57:28,114 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.10 vs. limit=15.0 2023-11-20 07:57:41,449 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 150600 2023-11-20 07:57:45,551 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.750e+01 8.528e+01 9.126e+01 1.017e+02 1.272e+02, threshold=1.825e+02, percent-clipped=0.0 2023-11-20 07:58:27,048 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 6350, loss[loss=0.06396, simple_loss=0.07951, pruned_loss=0.01113, audio_tagging_loss=0.01308, over 14477.00 frames. ], tot_loss[loss=0.07965, simple_loss=0.09953, pruned_loss=0.01961, audio_tagging_loss=0.01028, over 3052478.16 frames. ], batch size: 54, lr: 5.31e-03, grad_scale: 16.0 2023-11-20 07:58:46,476 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 150650 2023-11-20 07:59:00,867 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 07:59:06,348 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=1004420.0, ans=15.0 2023-11-20 07:59:14,732 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1004420.0, ans=0.0 2023-11-20 07:59:18,502 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1004486.6666666666, ans=0.0 2023-11-20 07:59:32,172 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 6400, loss[loss=0.07731, simple_loss=0.09219, pruned_loss=0.01765, audio_tagging_loss=0.01357, over 15661.00 frames. ], tot_loss[loss=0.08027, simple_loss=0.1003, pruned_loss=0.01973, audio_tagging_loss=0.01036, over 3053888.52 frames. ], batch size: 59, lr: 5.31e-03, grad_scale: 32.0 2023-11-20 07:59:41,636 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1004553.3333333334, ans=0.125 2023-11-20 07:59:51,890 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 150700 2023-11-20 07:59:56,014 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.650e+01 8.100e+01 8.750e+01 9.554e+01 1.310e+02, threshold=1.750e+02, percent-clipped=0.0 2023-11-20 08:00:03,042 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=1004686.6666666666, ans=15.0 2023-11-20 08:00:11,382 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1004753.3333333334, ans=0.1 2023-11-20 08:00:12,620 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=1004753.3333333334, ans=0.025 2023-11-20 08:00:36,138 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1004886.6666666666, ans=0.2 2023-11-20 08:00:37,044 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 6450, loss[loss=0.07381, simple_loss=0.08863, pruned_loss=0.01785, audio_tagging_loss=0.01164, over 14789.00 frames. ], tot_loss[loss=0.08092, simple_loss=0.1011, pruned_loss=0.02004, audio_tagging_loss=0.01033, over 3051198.17 frames. ], batch size: 56, lr: 5.31e-03, grad_scale: 32.0 2023-11-20 08:00:56,829 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 150750 2023-11-20 08:00:56,962 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1004953.3333333334, ans=0.2 2023-11-20 08:01:08,500 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1005020.0, ans=0.015 2023-11-20 08:01:08,650 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1005020.0, ans=0.125 2023-11-20 08:01:14,993 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1005086.6666666666, ans=0.0 2023-11-20 08:01:17,288 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1005086.6666666666, ans=0.04949747468305833 2023-11-20 08:01:42,380 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 6500, loss[loss=0.105, simple_loss=0.1219, pruned_loss=0.03142, audio_tagging_loss=0.01268, over 14421.00 frames. ], tot_loss[loss=0.08114, simple_loss=0.1014, pruned_loss=0.02013, audio_tagging_loss=0.01029, over 3045390.02 frames. ], batch size: 54, lr: 5.31e-03, grad_scale: 32.0 2023-11-20 08:02:00,522 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1005286.6666666666, ans=0.125 2023-11-20 08:02:01,557 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 150800 2023-11-20 08:02:05,448 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.846e+01 8.112e+01 8.845e+01 9.450e+01 1.236e+02, threshold=1.769e+02, percent-clipped=0.0 2023-11-20 08:02:27,480 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1005420.0, ans=0.0 2023-11-20 08:02:32,352 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1005420.0, ans=0.1 2023-11-20 08:02:36,503 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=1005486.6666666666, ans=15.0 2023-11-20 08:02:46,511 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1005553.3333333334, ans=0.0 2023-11-20 08:02:47,593 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 6550, loss[loss=0.0569, simple_loss=0.06702, pruned_loss=0.01068, audio_tagging_loss=0.01271, over 15731.00 frames. ], tot_loss[loss=0.08027, simple_loss=0.1006, pruned_loss=0.01978, audio_tagging_loss=0.01018, over 3037320.35 frames. ], batch size: 61, lr: 5.31e-03, grad_scale: 32.0 2023-11-20 08:02:50,394 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1005553.3333333334, ans=0.1 2023-11-20 08:03:06,847 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 150850 2023-11-20 08:03:14,589 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1005686.6666666666, ans=0.0 2023-11-20 08:03:25,473 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1005753.3333333334, ans=0.2 2023-11-20 08:03:29,113 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1005753.3333333334, ans=0.125 2023-11-20 08:03:49,377 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1005820.0, ans=0.09899494936611666 2023-11-20 08:03:51,447 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 6600, loss[loss=0.1304, simple_loss=0.1637, pruned_loss=0.04092, audio_tagging_loss=0.00758, over 15199.00 frames. ], tot_loss[loss=0.08088, simple_loss=0.1014, pruned_loss=0.02015, audio_tagging_loss=0.01003, over 3041235.17 frames. ], batch size: 54, lr: 5.31e-03, grad_scale: 16.0 2023-11-20 08:03:53,023 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1005886.6666666666, ans=0.125 2023-11-20 08:03:55,883 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 08:04:03,941 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.53 vs. limit=22.5 2023-11-20 08:04:06,751 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.12 vs. limit=15.0 2023-11-20 08:04:12,208 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 150900 2023-11-20 08:04:13,552 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1005953.3333333334, ans=0.0 2023-11-20 08:04:13,583 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1005953.3333333334, ans=0.125 2023-11-20 08:04:16,188 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1005953.3333333334, ans=0.1 2023-11-20 08:04:16,935 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.805e+01 8.131e+01 8.865e+01 9.580e+01 1.182e+02, threshold=1.773e+02, percent-clipped=0.0 2023-11-20 08:04:29,084 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1006020.0, ans=0.1 2023-11-20 08:04:34,169 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.22 vs. limit=15.0 2023-11-20 08:04:57,518 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 6650, loss[loss=0.07986, simple_loss=0.0993, pruned_loss=0.02077, audio_tagging_loss=0.009433, over 16035.00 frames. ], tot_loss[loss=0.08, simple_loss=0.1003, pruned_loss=0.01981, audio_tagging_loss=0.01004, over 3041332.26 frames. ], batch size: 60, lr: 5.30e-03, grad_scale: 16.0 2023-11-20 08:05:16,676 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 150950 2023-11-20 08:05:41,913 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=1006420.0, ans=15.0 2023-11-20 08:05:50,050 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1006486.6666666666, ans=0.125 2023-11-20 08:05:51,321 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1006486.6666666666, ans=0.125 2023-11-20 08:05:54,887 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1006486.6666666666, ans=0.125 2023-11-20 08:06:01,400 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 6700, loss[loss=0.08958, simple_loss=0.1025, pruned_loss=0.02691, audio_tagging_loss=0.0114, over 15160.00 frames. ], tot_loss[loss=0.07982, simple_loss=0.09995, pruned_loss=0.01984, audio_tagging_loss=0.01001, over 3045098.45 frames. ], batch size: 58, lr: 5.30e-03, grad_scale: 16.0 2023-11-20 08:06:11,958 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.96 vs. limit=12.0 2023-11-20 08:06:15,207 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1006620.0, ans=0.0 2023-11-20 08:06:17,572 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1006620.0, ans=0.2 2023-11-20 08:06:19,930 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 151000 2023-11-20 08:06:22,867 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1006620.0, ans=0.04949747468305833 2023-11-20 08:06:23,074 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1006620.0, ans=0.125 2023-11-20 08:06:25,051 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.540e+01 7.744e+01 8.706e+01 9.471e+01 1.164e+02, threshold=1.741e+02, percent-clipped=0.0 2023-11-20 08:06:25,476 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1006686.6666666666, ans=0.125 2023-11-20 08:06:46,576 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1006753.3333333334, ans=0.125 2023-11-20 08:07:05,432 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 6750, loss[loss=0.08908, simple_loss=0.1152, pruned_loss=0.0201, audio_tagging_loss=0.01138, over 16464.00 frames. ], tot_loss[loss=0.07977, simple_loss=0.09998, pruned_loss=0.01977, audio_tagging_loss=0.01001, over 3049468.90 frames. ], batch size: 63, lr: 5.30e-03, grad_scale: 16.0 2023-11-20 08:07:10,748 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1006886.6666666666, ans=0.0 2023-11-20 08:07:25,121 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 151050 2023-11-20 08:07:29,366 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.52 vs. limit=22.5 2023-11-20 08:07:35,179 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 08:08:10,114 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 6800, loss[loss=0.06206, simple_loss=0.07982, pruned_loss=0.01299, audio_tagging_loss=0.009163, over 14959.00 frames. ], tot_loss[loss=0.08011, simple_loss=0.1004, pruned_loss=0.01994, audio_tagging_loss=0.00997, over 3049812.82 frames. ], batch size: 57, lr: 5.30e-03, grad_scale: 32.0 2023-11-20 08:08:16,457 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1007220.0, ans=0.125 2023-11-20 08:08:16,635 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1007220.0, ans=0.1 2023-11-20 08:08:29,254 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 151100 2023-11-20 08:08:33,972 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.708e+01 8.041e+01 8.769e+01 9.846e+01 1.398e+02, threshold=1.754e+02, percent-clipped=0.0 2023-11-20 08:08:43,965 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1007353.3333333334, ans=0.125 2023-11-20 08:08:55,940 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1007420.0, ans=0.2 2023-11-20 08:09:05,041 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1007486.6666666666, ans=0.0 2023-11-20 08:09:13,929 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 6850, loss[loss=0.07086, simple_loss=0.105, pruned_loss=0.0128, audio_tagging_loss=0.005569, over 16085.00 frames. ], tot_loss[loss=0.08046, simple_loss=0.1012, pruned_loss=0.02001, audio_tagging_loss=0.009863, over 3047999.98 frames. ], batch size: 58, lr: 5.30e-03, grad_scale: 16.0 2023-11-20 08:09:17,901 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1007553.3333333334, ans=0.0 2023-11-20 08:09:20,322 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1007553.3333333334, ans=0.125 2023-11-20 08:09:32,365 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 151150 2023-11-20 08:09:50,469 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1007686.6666666666, ans=0.0 2023-11-20 08:10:09,381 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1007820.0, ans=0.035 2023-11-20 08:10:09,435 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1007820.0, ans=0.1 2023-11-20 08:10:14,223 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1007820.0, ans=0.035 2023-11-20 08:10:14,357 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1007820.0, ans=0.125 2023-11-20 08:10:14,442 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 08:10:17,687 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 6900, loss[loss=0.08321, simple_loss=0.109, pruned_loss=0.01613, audio_tagging_loss=0.01259, over 15816.00 frames. ], tot_loss[loss=0.0803, simple_loss=0.101, pruned_loss=0.01984, audio_tagging_loss=0.009938, over 3054580.74 frames. ], batch size: 57, lr: 5.30e-03, grad_scale: 16.0 2023-11-20 08:10:37,661 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 151200 2023-11-20 08:10:41,625 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.66 vs. limit=6.0 2023-11-20 08:10:44,630 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.186e+01 8.370e+01 9.169e+01 1.032e+02 1.396e+02, threshold=1.834e+02, percent-clipped=0.0 2023-11-20 08:11:08,960 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 08:11:12,962 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1008153.3333333334, ans=0.125 2023-11-20 08:11:22,938 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 6950, loss[loss=0.07789, simple_loss=0.1025, pruned_loss=0.01668, audio_tagging_loss=0.009964, over 15037.00 frames. ], tot_loss[loss=0.08123, simple_loss=0.1023, pruned_loss=0.02018, audio_tagging_loss=0.009902, over 3052449.96 frames. ], batch size: 55, lr: 5.30e-03, grad_scale: 16.0 2023-11-20 08:11:27,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1008220.0, ans=0.1 2023-11-20 08:11:38,238 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1008286.6666666666, ans=0.0 2023-11-20 08:11:43,501 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 151250 2023-11-20 08:11:46,266 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1008286.6666666666, ans=0.1 2023-11-20 08:11:49,764 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1008353.3333333334, ans=0.1 2023-11-20 08:11:50,991 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1008353.3333333334, ans=0.5 2023-11-20 08:11:59,443 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1008353.3333333334, ans=0.0 2023-11-20 08:12:27,857 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 7000, loss[loss=0.06759, simple_loss=0.08431, pruned_loss=0.01585, audio_tagging_loss=0.009588, over 13482.00 frames. ], tot_loss[loss=0.08061, simple_loss=0.101, pruned_loss=0.02006, audio_tagging_loss=0.01005, over 3048816.86 frames. ], batch size: 53, lr: 5.30e-03, grad_scale: 16.0 2023-11-20 08:12:46,745 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 151300 2023-11-20 08:12:52,721 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.717e+01 8.209e+01 8.895e+01 9.636e+01 1.347e+02, threshold=1.779e+02, percent-clipped=0.0 2023-11-20 08:13:16,350 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.39 vs. limit=15.0 2023-11-20 08:13:26,232 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=1008820.0, ans=0.5 2023-11-20 08:13:32,213 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 7050, loss[loss=0.06554, simple_loss=0.08466, pruned_loss=0.01541, audio_tagging_loss=0.007806, over 15879.00 frames. ], tot_loss[loss=0.08042, simple_loss=0.1008, pruned_loss=0.01994, audio_tagging_loss=0.01008, over 3046339.14 frames. ], batch size: 59, lr: 5.30e-03, grad_scale: 16.0 2023-11-20 08:13:33,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1008886.6666666666, ans=0.2 2023-11-20 08:13:51,853 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 151350 2023-11-20 08:13:54,812 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.64 vs. limit=6.0 2023-11-20 08:13:55,578 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1008953.3333333334, ans=0.125 2023-11-20 08:13:58,840 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.47 vs. limit=22.5 2023-11-20 08:14:06,464 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1009020.0, ans=0.125 2023-11-20 08:14:25,397 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1009153.3333333334, ans=0.2 2023-11-20 08:14:27,918 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1009153.3333333334, ans=0.125 2023-11-20 08:14:36,088 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 7100, loss[loss=0.08135, simple_loss=0.09643, pruned_loss=0.02242, audio_tagging_loss=0.01072, over 15405.00 frames. ], tot_loss[loss=0.08087, simple_loss=0.1014, pruned_loss=0.01999, audio_tagging_loss=0.01018, over 3044513.39 frames. ], batch size: 57, lr: 5.30e-03, grad_scale: 16.0 2023-11-20 08:14:56,632 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 151400 2023-11-20 08:15:03,704 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.030e+01 8.148e+01 8.789e+01 9.448e+01 1.213e+02, threshold=1.758e+02, percent-clipped=0.0 2023-11-20 08:15:22,350 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1009420.0, ans=0.125 2023-11-20 08:15:29,228 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1009486.6666666666, ans=0.125 2023-11-20 08:15:36,205 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1009486.6666666666, ans=0.125 2023-11-20 08:15:38,522 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1009486.6666666666, ans=0.0 2023-11-20 08:15:41,836 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 7150, loss[loss=0.07697, simple_loss=0.1025, pruned_loss=0.01752, audio_tagging_loss=0.008226, over 15741.00 frames. ], tot_loss[loss=0.0814, simple_loss=0.102, pruned_loss=0.02022, audio_tagging_loss=0.01018, over 3053934.63 frames. ], batch size: 59, lr: 5.30e-03, grad_scale: 16.0 2023-11-20 08:16:00,731 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1009620.0, ans=0.125 2023-11-20 08:16:01,901 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 151450 2023-11-20 08:16:11,816 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1009686.6666666666, ans=0.0 2023-11-20 08:16:14,991 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.55 vs. limit=15.0 2023-11-20 08:16:25,288 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1009753.3333333334, ans=0.1 2023-11-20 08:16:40,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1009820.0, ans=0.125 2023-11-20 08:16:42,703 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1009820.0, ans=0.125 2023-11-20 08:16:47,334 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 7200, loss[loss=0.06512, simple_loss=0.07851, pruned_loss=0.01563, audio_tagging_loss=0.01023, over 15128.00 frames. ], tot_loss[loss=0.08107, simple_loss=0.1013, pruned_loss=0.02017, audio_tagging_loss=0.01026, over 3050237.61 frames. ], batch size: 61, lr: 5.30e-03, grad_scale: 32.0 2023-11-20 08:16:51,345 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1009886.6666666666, ans=0.125 2023-11-20 08:17:06,110 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 151500 2023-11-20 08:17:13,436 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.859e+01 8.360e+01 9.161e+01 1.018e+02 1.279e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-20 08:17:35,249 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1010086.6666666666, ans=0.1 2023-11-20 08:17:48,971 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1010153.3333333334, ans=0.1 2023-11-20 08:17:50,997 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 7250, loss[loss=0.08425, simple_loss=0.1054, pruned_loss=0.02259, audio_tagging_loss=0.00897, over 14926.00 frames. ], tot_loss[loss=0.08101, simple_loss=0.1013, pruned_loss=0.02013, audio_tagging_loss=0.01025, over 3054126.95 frames. ], batch size: 56, lr: 5.29e-03, grad_scale: 32.0 2023-11-20 08:18:01,159 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 08:18:10,774 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 151550 2023-11-20 08:18:16,961 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.51 vs. limit=22.5 2023-11-20 08:18:40,251 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1010420.0, ans=0.125 2023-11-20 08:18:44,760 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.80 vs. limit=10.0 2023-11-20 08:18:50,652 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.46 vs. limit=15.0 2023-11-20 08:18:55,914 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 7300, loss[loss=0.08384, simple_loss=0.1096, pruned_loss=0.02134, audio_tagging_loss=0.007722, over 14779.00 frames. ], tot_loss[loss=0.08086, simple_loss=0.1011, pruned_loss=0.02006, audio_tagging_loss=0.01027, over 3054352.78 frames. ], batch size: 55, lr: 5.29e-03, grad_scale: 32.0 2023-11-20 08:18:56,544 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.93 vs. limit=22.5 2023-11-20 08:19:15,962 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 151600 2023-11-20 08:19:21,893 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.79 vs. limit=22.5 2023-11-20 08:19:22,265 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.093e+01 8.295e+01 8.871e+01 9.520e+01 1.560e+02, threshold=1.774e+02, percent-clipped=0.0 2023-11-20 08:19:24,243 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.71 vs. limit=12.0 2023-11-20 08:19:40,239 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.96 vs. limit=15.0 2023-11-20 08:19:43,496 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1010753.3333333334, ans=0.125 2023-11-20 08:20:01,281 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 7350, loss[loss=0.07159, simple_loss=0.08961, pruned_loss=0.01676, audio_tagging_loss=0.01002, over 14600.00 frames. ], tot_loss[loss=0.08058, simple_loss=0.1009, pruned_loss=0.01998, audio_tagging_loss=0.01014, over 3044671.33 frames. ], batch size: 56, lr: 5.29e-03, grad_scale: 32.0 2023-11-20 08:20:01,393 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1010886.6666666666, ans=0.015 2023-11-20 08:20:01,536 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1010886.6666666666, ans=0.125 2023-11-20 08:20:04,061 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1010886.6666666666, ans=0.125 2023-11-20 08:20:07,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1010886.6666666666, ans=0.2 2023-11-20 08:20:19,877 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 151650 2023-11-20 08:20:30,985 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1011020.0, ans=0.5 2023-11-20 08:20:35,000 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.89 vs. limit=15.0 2023-11-20 08:20:58,579 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.57 vs. limit=22.5 2023-11-20 08:21:05,194 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 7400, loss[loss=0.07492, simple_loss=0.09098, pruned_loss=0.01893, audio_tagging_loss=0.0105, over 14754.00 frames. ], tot_loss[loss=0.07985, simple_loss=0.1003, pruned_loss=0.01976, audio_tagging_loss=0.009951, over 3040040.46 frames. ], batch size: 56, lr: 5.29e-03, grad_scale: 32.0 2023-11-20 08:21:25,017 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 151700 2023-11-20 08:21:30,866 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.761e+01 8.067e+01 8.750e+01 9.547e+01 1.451e+02, threshold=1.750e+02, percent-clipped=0.0 2023-11-20 08:21:39,427 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.89 vs. limit=6.0 2023-11-20 08:21:54,497 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1011420.0, ans=0.1 2023-11-20 08:22:09,936 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 7450, loss[loss=0.09134, simple_loss=0.1177, pruned_loss=0.02317, audio_tagging_loss=0.009321, over 15253.00 frames. ], tot_loss[loss=0.0796, simple_loss=0.09996, pruned_loss=0.0197, audio_tagging_loss=0.009919, over 3036154.82 frames. ], batch size: 59, lr: 5.29e-03, grad_scale: 32.0 2023-11-20 08:22:18,831 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1011553.3333333334, ans=0.0 2023-11-20 08:22:29,157 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 151750 2023-11-20 08:22:29,399 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1011620.0, ans=0.0 2023-11-20 08:22:56,060 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1011753.3333333334, ans=0.0 2023-11-20 08:23:04,334 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 08:23:11,396 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.46 vs. limit=15.0 2023-11-20 08:23:12,323 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1011820.0, ans=0.2 2023-11-20 08:23:14,416 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 7500, loss[loss=0.07386, simple_loss=0.0857, pruned_loss=0.0164, audio_tagging_loss=0.01461, over 15895.00 frames. ], tot_loss[loss=0.07985, simple_loss=0.1004, pruned_loss=0.01978, audio_tagging_loss=0.009851, over 3040012.51 frames. ], batch size: 62, lr: 5.29e-03, grad_scale: 32.0 2023-11-20 08:23:24,982 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1011886.6666666666, ans=0.0 2023-11-20 08:23:33,515 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 151800 2023-11-20 08:23:40,379 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.995e+01 8.501e+01 9.262e+01 1.002e+02 1.307e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-20 08:24:12,515 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.71 vs. limit=15.0 2023-11-20 08:24:19,070 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 7550, loss[loss=0.06336, simple_loss=0.08283, pruned_loss=0.01358, audio_tagging_loss=0.00837, over 16309.00 frames. ], tot_loss[loss=0.08014, simple_loss=0.1008, pruned_loss=0.01988, audio_tagging_loss=0.009862, over 3049659.83 frames. ], batch size: 61, lr: 5.29e-03, grad_scale: 32.0 2023-11-20 08:24:34,024 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1012286.6666666666, ans=0.125 2023-11-20 08:24:38,765 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 151850 2023-11-20 08:24:42,929 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.29 vs. limit=22.5 2023-11-20 08:24:51,646 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1012353.3333333334, ans=0.125 2023-11-20 08:25:04,537 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1012420.0, ans=0.05 2023-11-20 08:25:13,183 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1012486.6666666666, ans=0.1 2023-11-20 08:25:18,635 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1012486.6666666666, ans=0.125 2023-11-20 08:25:18,702 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1012486.6666666666, ans=0.125 2023-11-20 08:25:23,848 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 7600, loss[loss=0.07181, simple_loss=0.09424, pruned_loss=0.01697, audio_tagging_loss=0.007719, over 15566.00 frames. ], tot_loss[loss=0.07982, simple_loss=0.1004, pruned_loss=0.01968, audio_tagging_loss=0.009929, over 3054563.18 frames. ], batch size: 57, lr: 5.29e-03, grad_scale: 32.0 2023-11-20 08:25:26,647 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1012553.3333333334, ans=0.0 2023-11-20 08:25:27,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1012553.3333333334, ans=0.2 2023-11-20 08:25:33,864 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1012553.3333333334, ans=0.0 2023-11-20 08:25:35,195 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1012620.0, ans=0.125 2023-11-20 08:25:35,225 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1012620.0, ans=0.125 2023-11-20 08:25:38,482 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1012620.0, ans=0.025 2023-11-20 08:25:42,054 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1012620.0, ans=0.2 2023-11-20 08:25:42,981 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 151900 2023-11-20 08:25:48,977 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.336e+01 8.168e+01 8.744e+01 9.399e+01 1.403e+02, threshold=1.749e+02, percent-clipped=0.0 2023-11-20 08:26:00,074 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1012686.6666666666, ans=0.025 2023-11-20 08:26:12,390 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1012753.3333333334, ans=0.1 2023-11-20 08:26:18,541 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1012820.0, ans=0.1 2023-11-20 08:26:20,099 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.56 vs. limit=15.0 2023-11-20 08:26:25,842 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=1012820.0, ans=0.1 2023-11-20 08:26:28,794 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 7650, loss[loss=0.08027, simple_loss=0.1039, pruned_loss=0.01999, audio_tagging_loss=0.008327, over 15343.00 frames. ], tot_loss[loss=0.07925, simple_loss=0.09957, pruned_loss=0.01949, audio_tagging_loss=0.009978, over 3056940.05 frames. ], batch size: 57, lr: 5.29e-03, grad_scale: 32.0 2023-11-20 08:26:30,285 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1012886.6666666666, ans=0.125 2023-11-20 08:26:30,777 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.23 vs. limit=15.0 2023-11-20 08:26:33,872 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1012886.6666666666, ans=0.125 2023-11-20 08:26:40,166 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1012953.3333333334, ans=10.0 2023-11-20 08:26:48,153 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 151950 2023-11-20 08:27:02,118 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1013020.0, ans=0.0 2023-11-20 08:27:33,354 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 7700, loss[loss=0.07281, simple_loss=0.09022, pruned_loss=0.01753, audio_tagging_loss=0.01017, over 13421.00 frames. ], tot_loss[loss=0.07938, simple_loss=0.1001, pruned_loss=0.01952, audio_tagging_loss=0.009833, over 3049051.67 frames. ], batch size: 53, lr: 5.29e-03, grad_scale: 32.0 2023-11-20 08:27:49,972 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1013286.6666666666, ans=0.125 2023-11-20 08:27:53,678 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 152000 2023-11-20 08:27:55,124 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-152000.pt 2023-11-20 08:28:03,801 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.653e+01 7.881e+01 8.536e+01 9.329e+01 1.282e+02, threshold=1.707e+02, percent-clipped=0.0 2023-11-20 08:28:10,965 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1013353.3333333334, ans=0.95 2023-11-20 08:28:13,421 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1013353.3333333334, ans=0.09899494936611666 2023-11-20 08:28:19,806 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=1013420.0, ans=22.5 2023-11-20 08:28:31,145 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1013486.6666666666, ans=0.1 2023-11-20 08:28:42,953 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 7750, loss[loss=0.1017, simple_loss=0.1346, pruned_loss=0.02669, audio_tagging_loss=0.007716, over 15578.00 frames. ], tot_loss[loss=0.07944, simple_loss=0.1, pruned_loss=0.0196, audio_tagging_loss=0.009836, over 3046375.28 frames. ], batch size: 57, lr: 5.29e-03, grad_scale: 32.0 2023-11-20 08:28:51,809 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1013553.3333333334, ans=0.125 2023-11-20 08:28:52,845 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1013553.3333333334, ans=0.0 2023-11-20 08:29:01,689 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 152050 2023-11-20 08:29:20,788 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1013753.3333333334, ans=0.125 2023-11-20 08:29:35,283 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=1013820.0, ans=0.05 2023-11-20 08:29:35,909 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.80 vs. limit=15.0 2023-11-20 08:29:36,656 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1013820.0, ans=0.0 2023-11-20 08:29:46,235 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 7800, loss[loss=0.1227, simple_loss=0.1652, pruned_loss=0.03269, audio_tagging_loss=0.007417, over 16208.00 frames. ], tot_loss[loss=0.08005, simple_loss=0.1009, pruned_loss=0.01972, audio_tagging_loss=0.009892, over 3044551.67 frames. ], batch size: 56, lr: 5.28e-03, grad_scale: 32.0 2023-11-20 08:29:59,406 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1013953.3333333334, ans=0.1 2023-11-20 08:30:04,371 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1013953.3333333334, ans=0.125 2023-11-20 08:30:05,460 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 152100 2023-11-20 08:30:11,877 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=14.30 vs. limit=15.0 2023-11-20 08:30:12,164 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.723e+01 8.195e+01 8.711e+01 9.199e+01 1.205e+02, threshold=1.742e+02, percent-clipped=0.0 2023-11-20 08:30:15,140 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1014020.0, ans=0.1 2023-11-20 08:30:19,418 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1014020.0, ans=0.025 2023-11-20 08:30:19,433 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1014020.0, ans=0.2 2023-11-20 08:30:22,542 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1014020.0, ans=0.125 2023-11-20 08:30:23,618 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1014020.0, ans=0.125 2023-11-20 08:30:40,470 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.63 vs. limit=6.0 2023-11-20 08:30:51,130 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 7850, loss[loss=0.08307, simple_loss=0.1126, pruned_loss=0.01633, audio_tagging_loss=0.01043, over 13961.00 frames. ], tot_loss[loss=0.08005, simple_loss=0.1009, pruned_loss=0.01963, audio_tagging_loss=0.009999, over 3040965.46 frames. ], batch size: 53, lr: 5.28e-03, grad_scale: 32.0 2023-11-20 08:31:07,527 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1014286.6666666666, ans=0.0 2023-11-20 08:31:11,666 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 152150 2023-11-20 08:31:13,140 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1014286.6666666666, ans=0.125 2023-11-20 08:31:31,020 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=1014420.0, ans=0.025 2023-11-20 08:31:39,842 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1014420.0, ans=0.0 2023-11-20 08:31:56,454 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 7900, loss[loss=0.06309, simple_loss=0.07906, pruned_loss=0.0128, audio_tagging_loss=0.01075, over 15755.00 frames. ], tot_loss[loss=0.07989, simple_loss=0.1008, pruned_loss=0.01947, audio_tagging_loss=0.01003, over 3044641.66 frames. ], batch size: 59, lr: 5.28e-03, grad_scale: 32.0 2023-11-20 08:31:58,009 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1014553.3333333334, ans=0.125 2023-11-20 08:32:15,258 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 152200 2023-11-20 08:32:18,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1014620.0, ans=0.2 2023-11-20 08:32:22,234 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.716e+01 8.112e+01 8.963e+01 9.750e+01 1.229e+02, threshold=1.793e+02, percent-clipped=0.0 2023-11-20 08:32:37,275 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1014753.3333333334, ans=0.125 2023-11-20 08:32:44,536 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1014753.3333333334, ans=0.125 2023-11-20 08:33:00,857 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 7950, loss[loss=0.1024, simple_loss=0.13, pruned_loss=0.02967, audio_tagging_loss=0.007746, over 16353.00 frames. ], tot_loss[loss=0.08023, simple_loss=0.1009, pruned_loss=0.01959, audio_tagging_loss=0.0102, over 3042390.66 frames. ], batch size: 57, lr: 5.28e-03, grad_scale: 32.0 2023-11-20 08:33:02,299 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1014886.6666666666, ans=0.0 2023-11-20 08:33:17,554 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 08:33:19,988 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 152250 2023-11-20 08:33:34,944 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1015020.0, ans=0.125 2023-11-20 08:33:35,179 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.22 vs. limit=15.0 2023-11-20 08:33:56,500 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1015153.3333333334, ans=0.0 2023-11-20 08:33:56,646 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1015153.3333333334, ans=0.125 2023-11-20 08:33:58,277 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.96 vs. limit=12.0 2023-11-20 08:34:04,849 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 8000, loss[loss=0.0705, simple_loss=0.07902, pruned_loss=0.01745, audio_tagging_loss=0.01354, over 15414.00 frames. ], tot_loss[loss=0.08023, simple_loss=0.1005, pruned_loss=0.01964, audio_tagging_loss=0.01032, over 3038783.06 frames. ], batch size: 58, lr: 5.28e-03, grad_scale: 32.0 2023-11-20 08:34:21,814 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1015286.6666666666, ans=0.2 2023-11-20 08:34:24,109 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 152300 2023-11-20 08:34:24,196 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1015286.6666666666, ans=0.0 2023-11-20 08:34:30,837 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.971e+01 8.267e+01 8.804e+01 9.546e+01 1.251e+02, threshold=1.761e+02, percent-clipped=0.0 2023-11-20 08:35:08,905 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 8050, loss[loss=0.07749, simple_loss=0.09456, pruned_loss=0.02085, audio_tagging_loss=0.009361, over 15031.00 frames. ], tot_loss[loss=0.08048, simple_loss=0.1007, pruned_loss=0.01985, audio_tagging_loss=0.01027, over 3041857.13 frames. ], batch size: 58, lr: 5.28e-03, grad_scale: 32.0 2023-11-20 08:35:22,523 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.24 vs. limit=6.0 2023-11-20 08:35:29,137 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 152350 2023-11-20 08:35:29,387 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1015620.0, ans=0.0 2023-11-20 08:36:08,723 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1015820.0, ans=0.125 2023-11-20 08:36:10,936 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1015820.0, ans=0.2 2023-11-20 08:36:12,624 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.87 vs. limit=22.5 2023-11-20 08:36:12,773 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.66 vs. limit=22.5 2023-11-20 08:36:14,456 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 8100, loss[loss=0.07828, simple_loss=0.1003, pruned_loss=0.01825, audio_tagging_loss=0.009879, over 14842.00 frames. ], tot_loss[loss=0.08089, simple_loss=0.101, pruned_loss=0.02016, audio_tagging_loss=0.01022, over 3037745.98 frames. ], batch size: 56, lr: 5.28e-03, grad_scale: 32.0 2023-11-20 08:36:33,585 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 152400 2023-11-20 08:36:39,865 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.824e+01 8.032e+01 8.730e+01 9.665e+01 1.175e+02, threshold=1.746e+02, percent-clipped=0.0 2023-11-20 08:36:40,445 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.71 vs. limit=12.0 2023-11-20 08:37:03,792 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.31 vs. limit=15.0 2023-11-20 08:37:18,476 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 8150, loss[loss=0.08288, simple_loss=0.1055, pruned_loss=0.02196, audio_tagging_loss=0.008185, over 14562.00 frames. ], tot_loss[loss=0.08102, simple_loss=0.1012, pruned_loss=0.02032, audio_tagging_loss=0.01013, over 3034940.07 frames. ], batch size: 57, lr: 5.28e-03, grad_scale: 16.0 2023-11-20 08:37:37,881 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 152450 2023-11-20 08:37:45,391 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=1016353.3333333334, ans=10.0 2023-11-20 08:37:47,823 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1016353.3333333334, ans=0.2 2023-11-20 08:37:51,612 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1016353.3333333334, ans=0.07 2023-11-20 08:37:52,724 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1016353.3333333334, ans=0.125 2023-11-20 08:38:07,717 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1016420.0, ans=0.015 2023-11-20 08:38:09,072 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1016486.6666666666, ans=0.125 2023-11-20 08:38:22,210 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 8200, loss[loss=0.04716, simple_loss=0.06606, pruned_loss=0.006699, audio_tagging_loss=0.007428, over 15767.00 frames. ], tot_loss[loss=0.08106, simple_loss=0.1018, pruned_loss=0.02026, audio_tagging_loss=0.009912, over 3033277.94 frames. ], batch size: 59, lr: 5.28e-03, grad_scale: 16.0 2023-11-20 08:38:23,446 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 08:38:26,628 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1016553.3333333334, ans=0.125 2023-11-20 08:38:42,203 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 152500 2023-11-20 08:38:49,934 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.647e+01 7.999e+01 8.746e+01 9.539e+01 1.196e+02, threshold=1.749e+02, percent-clipped=0.0 2023-11-20 08:38:51,849 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=1016686.6666666666, ans=15.0 2023-11-20 08:38:58,888 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1016686.6666666666, ans=0.125 2023-11-20 08:39:10,977 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1016753.3333333334, ans=0.125 2023-11-20 08:39:12,377 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1016820.0, ans=0.125 2023-11-20 08:39:14,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1016820.0, ans=0.1 2023-11-20 08:39:22,086 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1016820.0, ans=0.125 2023-11-20 08:39:23,150 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=1016820.0, ans=0.05 2023-11-20 08:39:27,135 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 8250, loss[loss=0.06549, simple_loss=0.0758, pruned_loss=0.01272, audio_tagging_loss=0.01488, over 14598.00 frames. ], tot_loss[loss=0.0803, simple_loss=0.1008, pruned_loss=0.01996, audio_tagging_loss=0.009949, over 3032886.22 frames. ], batch size: 56, lr: 5.28e-03, grad_scale: 16.0 2023-11-20 08:39:34,706 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1016886.6666666666, ans=0.0 2023-11-20 08:39:37,793 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1016886.6666666666, ans=0.125 2023-11-20 08:39:42,829 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 08:39:46,097 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 152550 2023-11-20 08:40:03,760 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.99 vs. limit=22.5 2023-11-20 08:40:08,881 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1017086.6666666666, ans=0.125 2023-11-20 08:40:22,662 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.619e-03 2023-11-20 08:40:29,445 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.41 vs. limit=15.0 2023-11-20 08:40:31,464 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 8300, loss[loss=0.07951, simple_loss=0.1006, pruned_loss=0.01918, audio_tagging_loss=0.01004, over 15709.00 frames. ], tot_loss[loss=0.08094, simple_loss=0.1018, pruned_loss=0.02016, audio_tagging_loss=0.009887, over 3044871.31 frames. ], batch size: 58, lr: 5.28e-03, grad_scale: 16.0 2023-11-20 08:40:39,003 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1017220.0, ans=0.125 2023-11-20 08:40:49,669 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 152600 2023-11-20 08:40:57,996 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.507e+01 8.110e+01 8.922e+01 9.710e+01 1.456e+02, threshold=1.784e+02, percent-clipped=0.0 2023-11-20 08:41:19,809 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1017420.0, ans=0.2 2023-11-20 08:41:28,659 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.67 vs. limit=15.0 2023-11-20 08:41:28,851 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.58 vs. limit=15.0 2023-11-20 08:41:35,131 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 8350, loss[loss=0.0836, simple_loss=0.1107, pruned_loss=0.01964, audio_tagging_loss=0.008623, over 15151.00 frames. ], tot_loss[loss=0.08076, simple_loss=0.1018, pruned_loss=0.02005, audio_tagging_loss=0.009826, over 3047706.24 frames. ], batch size: 56, lr: 5.28e-03, grad_scale: 16.0 2023-11-20 08:41:51,652 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.73 vs. limit=10.0 2023-11-20 08:41:54,612 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 152650 2023-11-20 08:42:05,858 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=1017686.6666666666, ans=10.0 2023-11-20 08:42:31,108 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1017820.0, ans=0.2 2023-11-20 08:42:33,583 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.79 vs. limit=15.0 2023-11-20 08:42:34,761 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1017820.0, ans=0.1 2023-11-20 08:42:39,805 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 8400, loss[loss=0.08142, simple_loss=0.1083, pruned_loss=0.01721, audio_tagging_loss=0.01006, over 14833.00 frames. ], tot_loss[loss=0.08032, simple_loss=0.1012, pruned_loss=0.01989, audio_tagging_loss=0.009856, over 3049356.15 frames. ], batch size: 57, lr: 5.27e-03, grad_scale: 32.0 2023-11-20 08:42:41,119 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1017886.6666666666, ans=0.1 2023-11-20 08:42:59,249 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 152700 2023-11-20 08:43:04,413 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1018020.0, ans=0.2 2023-11-20 08:43:06,581 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.274e+01 8.016e+01 8.880e+01 9.653e+01 1.299e+02, threshold=1.776e+02, percent-clipped=0.0 2023-11-20 08:43:26,631 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1018086.6666666666, ans=0.1 2023-11-20 08:43:44,769 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 8450, loss[loss=0.07996, simple_loss=0.1068, pruned_loss=0.01782, audio_tagging_loss=0.008762, over 14904.00 frames. ], tot_loss[loss=0.08014, simple_loss=0.1009, pruned_loss=0.01981, audio_tagging_loss=0.009896, over 3046519.44 frames. ], batch size: 56, lr: 5.27e-03, grad_scale: 32.0 2023-11-20 08:43:58,573 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1018286.6666666666, ans=0.07 2023-11-20 08:44:03,271 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 152750 2023-11-20 08:44:31,412 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1018420.0, ans=0.0 2023-11-20 08:44:48,085 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 8500, loss[loss=0.08851, simple_loss=0.122, pruned_loss=0.01856, audio_tagging_loss=0.008954, over 15801.00 frames. ], tot_loss[loss=0.08096, simple_loss=0.1022, pruned_loss=0.02004, audio_tagging_loss=0.009849, over 3054011.55 frames. ], batch size: 55, lr: 5.27e-03, grad_scale: 32.0 2023-11-20 08:45:07,847 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 152800 2023-11-20 08:45:15,427 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.959e+01 8.066e+01 8.928e+01 9.740e+01 1.439e+02, threshold=1.786e+02, percent-clipped=0.0 2023-11-20 08:45:16,029 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.16 vs. limit=15.0 2023-11-20 08:45:21,155 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1018686.6666666666, ans=0.2 2023-11-20 08:45:53,037 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 8550, loss[loss=0.07298, simple_loss=0.08748, pruned_loss=0.01903, audio_tagging_loss=0.01021, over 16651.00 frames. ], tot_loss[loss=0.07987, simple_loss=0.1006, pruned_loss=0.01967, audio_tagging_loss=0.009902, over 3055427.36 frames. ], batch size: 65, lr: 5.27e-03, grad_scale: 32.0 2023-11-20 08:46:09,521 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1018953.3333333334, ans=0.125 2023-11-20 08:46:12,915 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 152850 2023-11-20 08:46:33,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1019086.6666666666, ans=0.0 2023-11-20 08:46:39,708 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1019086.6666666666, ans=0.125 2023-11-20 08:46:41,014 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1019086.6666666666, ans=0.2 2023-11-20 08:46:57,792 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 8600, loss[loss=0.07252, simple_loss=0.09194, pruned_loss=0.01784, audio_tagging_loss=0.008714, over 14827.00 frames. ], tot_loss[loss=0.07936, simple_loss=0.1, pruned_loss=0.01935, audio_tagging_loss=0.009992, over 3051249.16 frames. ], batch size: 56, lr: 5.27e-03, grad_scale: 32.0 2023-11-20 08:47:15,857 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1019286.6666666666, ans=0.125 2023-11-20 08:47:16,769 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 152900 2023-11-20 08:47:24,683 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.691e+01 7.921e+01 8.657e+01 9.489e+01 1.169e+02, threshold=1.731e+02, percent-clipped=0.0 2023-11-20 08:47:27,282 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1019353.3333333334, ans=0.125 2023-11-20 08:47:28,622 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1019353.3333333334, ans=0.09899494936611666 2023-11-20 08:47:38,992 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1019420.0, ans=0.1 2023-11-20 08:47:57,801 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1019486.6666666666, ans=0.125 2023-11-20 08:48:02,358 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 8650, loss[loss=0.08483, simple_loss=0.1056, pruned_loss=0.0227, audio_tagging_loss=0.009332, over 16009.00 frames. ], tot_loss[loss=0.08025, simple_loss=0.101, pruned_loss=0.01966, audio_tagging_loss=0.01008, over 3058085.42 frames. ], batch size: 61, lr: 5.27e-03, grad_scale: 32.0 2023-11-20 08:48:10,075 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1019553.3333333334, ans=0.1 2023-11-20 08:48:22,132 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 152950 2023-11-20 08:48:23,601 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 08:48:23,623 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1019620.0, ans=0.125 2023-11-20 08:48:34,486 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1019686.6666666666, ans=0.0 2023-11-20 08:48:37,857 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.85 vs. limit=6.0 2023-11-20 08:48:41,887 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.68 vs. limit=10.0 2023-11-20 08:48:42,684 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1019753.3333333334, ans=0.125 2023-11-20 08:49:06,033 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 8700, loss[loss=0.09141, simple_loss=0.1169, pruned_loss=0.02228, audio_tagging_loss=0.01066, over 16333.00 frames. ], tot_loss[loss=0.08038, simple_loss=0.1012, pruned_loss=0.01964, audio_tagging_loss=0.01014, over 3062408.38 frames. ], batch size: 61, lr: 5.27e-03, grad_scale: 16.0 2023-11-20 08:49:25,907 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 153000 2023-11-20 08:49:35,420 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.857e+01 8.289e+01 9.010e+01 9.707e+01 1.388e+02, threshold=1.802e+02, percent-clipped=0.0 2023-11-20 08:49:51,086 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=1020086.6666666666, ans=0.5 2023-11-20 08:50:07,027 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.53 vs. limit=15.0 2023-11-20 08:50:09,099 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1020153.3333333334, ans=0.0 2023-11-20 08:50:11,235 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 8750, loss[loss=0.08727, simple_loss=0.1182, pruned_loss=0.02034, audio_tagging_loss=0.007842, over 14190.00 frames. ], tot_loss[loss=0.08164, simple_loss=0.1029, pruned_loss=0.02004, audio_tagging_loss=0.01014, over 3061014.96 frames. ], batch size: 55, lr: 5.27e-03, grad_scale: 16.0 2023-11-20 08:50:18,217 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1020220.0, ans=0.125 2023-11-20 08:50:27,382 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1020286.6666666666, ans=0.0 2023-11-20 08:50:30,885 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 153050 2023-11-20 08:50:46,601 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1020353.3333333334, ans=0.0 2023-11-20 08:51:01,728 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1020486.6666666666, ans=0.0 2023-11-20 08:51:15,740 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.53 vs. limit=15.0 2023-11-20 08:51:16,246 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 8800, loss[loss=0.06745, simple_loss=0.08339, pruned_loss=0.01572, audio_tagging_loss=0.01003, over 14366.00 frames. ], tot_loss[loss=0.08167, simple_loss=0.1029, pruned_loss=0.02006, audio_tagging_loss=0.01014, over 3060287.62 frames. ], batch size: 54, lr: 5.27e-03, grad_scale: 32.0 2023-11-20 08:51:19,028 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1020553.3333333334, ans=0.0 2023-11-20 08:51:35,340 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 153100 2023-11-20 08:51:44,279 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.487e+01 8.394e+01 9.116e+01 1.011e+02 1.329e+02, threshold=1.823e+02, percent-clipped=0.0 2023-11-20 08:51:44,640 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1020686.6666666666, ans=0.125 2023-11-20 08:51:54,109 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1020753.3333333334, ans=0.125 2023-11-20 08:52:09,537 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1020820.0, ans=0.125 2023-11-20 08:52:20,904 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 8850, loss[loss=0.09436, simple_loss=0.119, pruned_loss=0.0247, audio_tagging_loss=0.01019, over 15921.00 frames. ], tot_loss[loss=0.08147, simple_loss=0.1023, pruned_loss=0.02003, audio_tagging_loss=0.01029, over 3057772.49 frames. ], batch size: 59, lr: 5.27e-03, grad_scale: 32.0 2023-11-20 08:52:26,703 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1020886.6666666666, ans=0.125 2023-11-20 08:52:26,727 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1020886.6666666666, ans=0.1 2023-11-20 08:52:32,102 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.70 vs. limit=15.0 2023-11-20 08:52:33,721 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 08:52:39,929 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 153150 2023-11-20 08:53:26,058 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 8900, loss[loss=0.09169, simple_loss=0.1272, pruned_loss=0.02264, audio_tagging_loss=0.005431, over 15492.00 frames. ], tot_loss[loss=0.08119, simple_loss=0.1022, pruned_loss=0.02001, audio_tagging_loss=0.01006, over 3055783.37 frames. ], batch size: 57, lr: 5.27e-03, grad_scale: 16.0 2023-11-20 08:53:26,460 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1021220.0, ans=0.5 2023-11-20 08:53:45,878 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 153200 2023-11-20 08:53:55,908 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.066e+01 8.130e+01 8.663e+01 9.532e+01 1.311e+02, threshold=1.733e+02, percent-clipped=0.0 2023-11-20 08:53:56,728 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.32 vs. limit=15.0 2023-11-20 08:53:59,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1021353.3333333334, ans=0.125 2023-11-20 08:54:08,934 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.26 vs. limit=15.0 2023-11-20 08:54:18,642 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1021486.6666666666, ans=0.125 2023-11-20 08:54:24,037 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1021486.6666666666, ans=0.125 2023-11-20 08:54:31,116 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 8950, loss[loss=0.05336, simple_loss=0.0631, pruned_loss=0.01253, audio_tagging_loss=0.009271, over 14154.00 frames. ], tot_loss[loss=0.08108, simple_loss=0.1023, pruned_loss=0.02001, audio_tagging_loss=0.009926, over 3053423.04 frames. ], batch size: 57, lr: 5.26e-03, grad_scale: 16.0 2023-11-20 08:54:40,543 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1021553.3333333334, ans=0.125 2023-11-20 08:54:47,672 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.58 vs. limit=8.0 2023-11-20 08:54:50,584 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 153250 2023-11-20 08:54:50,699 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1021620.0, ans=0.125 2023-11-20 08:55:06,925 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.79 vs. limit=10.0 2023-11-20 08:55:30,802 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1021820.0, ans=0.0 2023-11-20 08:55:36,059 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 9000, loss[loss=0.0641, simple_loss=0.08841, pruned_loss=0.01049, audio_tagging_loss=0.0094, over 15870.00 frames. ], tot_loss[loss=0.08136, simple_loss=0.1029, pruned_loss=0.02006, audio_tagging_loss=0.009865, over 3054465.46 frames. ], batch size: 59, lr: 5.26e-03, grad_scale: 8.0 2023-11-20 08:55:36,090 INFO [train_asr.py:1285] (0/4) Computing validation loss 2023-11-20 08:56:18,739 INFO [train_asr.py:1294] (0/4) Epoch 13, validation: loss=0.06245, simple_loss=0.0538, pruned_loss=0.005768, audio_tagging_loss=0.02978, over 4681554.00 frames. 2023-11-20 08:56:18,740 INFO [train_asr.py:1295] (0/4) Maximum memory allocated so far is 25925MB 2023-11-20 08:56:36,994 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 153300 2023-11-20 08:56:48,914 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.462e+01 8.276e+01 8.856e+01 9.604e+01 3.298e+02, threshold=1.771e+02, percent-clipped=1.0 2023-11-20 08:56:52,898 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1022020.0, ans=0.1 2023-11-20 08:57:03,502 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1022086.6666666666, ans=0.0 2023-11-20 08:57:19,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1022153.3333333334, ans=0.125 2023-11-20 08:57:22,126 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 9050, loss[loss=0.08852, simple_loss=0.1096, pruned_loss=0.02376, audio_tagging_loss=0.009959, over 14371.00 frames. ], tot_loss[loss=0.08152, simple_loss=0.103, pruned_loss=0.02015, audio_tagging_loss=0.009855, over 3049046.27 frames. ], batch size: 53, lr: 5.26e-03, grad_scale: 8.0 2023-11-20 08:57:29,355 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1022220.0, ans=0.1 2023-11-20 08:57:41,931 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 153350 2023-11-20 08:57:59,292 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1022353.3333333334, ans=0.125 2023-11-20 08:58:24,478 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 08:58:26,689 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 9100, loss[loss=0.06696, simple_loss=0.08647, pruned_loss=0.01438, audio_tagging_loss=0.009344, over 14911.00 frames. ], tot_loss[loss=0.08107, simple_loss=0.1027, pruned_loss=0.01996, audio_tagging_loss=0.009764, over 3055039.78 frames. ], batch size: 56, lr: 5.26e-03, grad_scale: 8.0 2023-11-20 08:58:26,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1022553.3333333334, ans=0.1 2023-11-20 08:58:28,285 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1022553.3333333334, ans=0.125 2023-11-20 08:58:34,111 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.93 vs. limit=6.0 2023-11-20 08:58:46,281 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 153400 2023-11-20 08:58:53,206 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.11 vs. limit=22.5 2023-11-20 08:58:56,554 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1022686.6666666666, ans=0.0 2023-11-20 08:58:57,445 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.670e+01 8.168e+01 8.794e+01 9.522e+01 1.542e+02, threshold=1.759e+02, percent-clipped=0.0 2023-11-20 08:59:05,104 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1022753.3333333334, ans=0.025 2023-11-20 08:59:12,671 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1022753.3333333334, ans=0.1 2023-11-20 08:59:16,199 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1022753.3333333334, ans=0.1 2023-11-20 08:59:16,217 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1022753.3333333334, ans=0.2 2023-11-20 08:59:31,259 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 9150, loss[loss=0.0888, simple_loss=0.1169, pruned_loss=0.02404, audio_tagging_loss=0.006322, over 14993.00 frames. ], tot_loss[loss=0.08118, simple_loss=0.1028, pruned_loss=0.02009, audio_tagging_loss=0.009699, over 3059490.59 frames. ], batch size: 54, lr: 5.26e-03, grad_scale: 8.0 2023-11-20 08:59:41,887 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1022886.6666666666, ans=0.0 2023-11-20 08:59:47,842 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1022953.3333333334, ans=0.125 2023-11-20 08:59:50,144 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 153450 2023-11-20 08:59:56,406 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1023020.0, ans=0.125 2023-11-20 09:00:11,758 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1023086.6666666666, ans=0.1 2023-11-20 09:00:12,088 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.51 vs. limit=12.0 2023-11-20 09:00:34,633 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1023220.0, ans=0.125 2023-11-20 09:00:35,459 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 9200, loss[loss=0.0979, simple_loss=0.1194, pruned_loss=0.02693, audio_tagging_loss=0.01128, over 15140.00 frames. ], tot_loss[loss=0.08093, simple_loss=0.1022, pruned_loss=0.02006, audio_tagging_loss=0.009744, over 3052140.51 frames. ], batch size: 57, lr: 5.26e-03, grad_scale: 16.0 2023-11-20 09:00:51,395 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1023286.6666666666, ans=0.125 2023-11-20 09:00:55,649 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 153500 2023-11-20 09:01:07,209 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.430e+01 8.062e+01 8.603e+01 9.204e+01 1.228e+02, threshold=1.721e+02, percent-clipped=0.0 2023-11-20 09:01:40,790 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 9250, loss[loss=0.07619, simple_loss=0.09932, pruned_loss=0.01726, audio_tagging_loss=0.009272, over 15114.00 frames. ], tot_loss[loss=0.08029, simple_loss=0.1013, pruned_loss=0.01979, audio_tagging_loss=0.009867, over 3053777.97 frames. ], batch size: 57, lr: 5.26e-03, grad_scale: 16.0 2023-11-20 09:01:42,307 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1023553.3333333334, ans=0.125 2023-11-20 09:01:44,323 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1023553.3333333334, ans=0.125 2023-11-20 09:01:47,874 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1023553.3333333334, ans=0.0 2023-11-20 09:02:00,751 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 153550 2023-11-20 09:02:09,050 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1023686.6666666666, ans=0.05 2023-11-20 09:02:37,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1023820.0, ans=0.125 2023-11-20 09:02:41,501 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1023820.0, ans=0.125 2023-11-20 09:02:46,010 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 9300, loss[loss=0.07166, simple_loss=0.08582, pruned_loss=0.0151, audio_tagging_loss=0.01366, over 14457.00 frames. ], tot_loss[loss=0.08014, simple_loss=0.1007, pruned_loss=0.01986, audio_tagging_loss=0.009935, over 3055788.66 frames. ], batch size: 56, lr: 5.26e-03, grad_scale: 16.0 2023-11-20 09:02:49,809 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1023886.6666666666, ans=0.1 2023-11-20 09:02:55,569 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1023886.6666666666, ans=0.125 2023-11-20 09:03:01,823 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1023953.3333333334, ans=0.0 2023-11-20 09:03:03,067 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1023953.3333333334, ans=0.125 2023-11-20 09:03:03,142 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1023953.3333333334, ans=0.0 2023-11-20 09:03:05,359 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 153600 2023-11-20 09:03:17,351 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.484e+01 8.349e+01 9.030e+01 1.018e+02 1.384e+02, threshold=1.806e+02, percent-clipped=0.0 2023-11-20 09:03:20,200 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1024020.0, ans=0.0 2023-11-20 09:03:22,570 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1024020.0, ans=0.125 2023-11-20 09:03:32,515 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1024086.6666666666, ans=0.2 2023-11-20 09:03:32,626 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1024086.6666666666, ans=0.95 2023-11-20 09:03:47,924 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1024153.3333333334, ans=0.1 2023-11-20 09:03:49,544 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.17 vs. limit=15.0 2023-11-20 09:03:51,228 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 9350, loss[loss=0.08445, simple_loss=0.1068, pruned_loss=0.01853, audio_tagging_loss=0.01251, over 15373.00 frames. ], tot_loss[loss=0.07952, simple_loss=0.09977, pruned_loss=0.01966, audio_tagging_loss=0.009982, over 3054266.29 frames. ], batch size: 58, lr: 5.26e-03, grad_scale: 16.0 2023-11-20 09:04:05,599 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1024286.6666666666, ans=0.125 2023-11-20 09:04:10,021 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 153650 2023-11-20 09:04:10,170 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1024286.6666666666, ans=0.0 2023-11-20 09:04:12,672 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1024286.6666666666, ans=0.0 2023-11-20 09:04:20,580 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1024353.3333333334, ans=0.125 2023-11-20 09:04:23,717 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.50 vs. limit=15.0 2023-11-20 09:04:35,950 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1024420.0, ans=0.125 2023-11-20 09:04:43,519 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1024486.6666666666, ans=0.2 2023-11-20 09:04:45,820 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 09:04:50,615 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1024486.6666666666, ans=0.1 2023-11-20 09:04:53,789 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1024553.3333333334, ans=0.125 2023-11-20 09:04:54,531 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 9400, loss[loss=0.11, simple_loss=0.1375, pruned_loss=0.03048, audio_tagging_loss=0.01078, over 15299.00 frames. ], tot_loss[loss=0.07992, simple_loss=0.1005, pruned_loss=0.01969, audio_tagging_loss=0.01001, over 3042626.74 frames. ], batch size: 56, lr: 5.26e-03, grad_scale: 16.0 2023-11-20 09:05:00,005 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.57 vs. limit=10.0 2023-11-20 09:05:14,397 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 153700 2023-11-20 09:05:25,818 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.09 vs. limit=15.0 2023-11-20 09:05:26,174 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.113e+01 8.189e+01 8.740e+01 9.691e+01 1.507e+02, threshold=1.748e+02, percent-clipped=0.0 2023-11-20 09:05:57,737 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 09:05:58,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1024886.6666666666, ans=0.2 2023-11-20 09:05:59,521 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 9450, loss[loss=0.08958, simple_loss=0.1145, pruned_loss=0.02151, audio_tagging_loss=0.01083, over 16813.00 frames. ], tot_loss[loss=0.07963, simple_loss=0.09986, pruned_loss=0.01966, audio_tagging_loss=0.01004, over 3041819.77 frames. ], batch size: 60, lr: 5.26e-03, grad_scale: 16.0 2023-11-20 09:06:14,015 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1024953.3333333334, ans=0.0 2023-11-20 09:06:14,323 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.81 vs. limit=15.0 2023-11-20 09:06:18,723 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 153750 2023-11-20 09:06:24,973 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1025020.0, ans=0.125 2023-11-20 09:06:27,616 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1025020.0, ans=0.0 2023-11-20 09:06:41,678 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 09:06:49,224 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.36 vs. limit=22.5 2023-11-20 09:06:51,373 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1025153.3333333334, ans=0.0 2023-11-20 09:06:59,941 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.whiten.whitening_limit, batch_count=1025153.3333333334, ans=12.0 2023-11-20 09:07:04,144 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 9500, loss[loss=0.06875, simple_loss=0.08501, pruned_loss=0.01503, audio_tagging_loss=0.01122, over 15253.00 frames. ], tot_loss[loss=0.0802, simple_loss=0.1004, pruned_loss=0.01981, audio_tagging_loss=0.0102, over 3042721.30 frames. ], batch size: 58, lr: 5.26e-03, grad_scale: 16.0 2023-11-20 09:07:23,685 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 153800 2023-11-20 09:07:35,609 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.073e+01 8.164e+01 8.858e+01 9.394e+01 2.637e+02, threshold=1.772e+02, percent-clipped=1.0 2023-11-20 09:07:44,116 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1025420.0, ans=0.0 2023-11-20 09:07:58,275 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1025486.6666666666, ans=0.0 2023-11-20 09:08:09,391 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 9550, loss[loss=0.09081, simple_loss=0.1157, pruned_loss=0.02062, audio_tagging_loss=0.01235, over 15242.00 frames. ], tot_loss[loss=0.08031, simple_loss=0.1004, pruned_loss=0.01988, audio_tagging_loss=0.01025, over 3042371.95 frames. ], batch size: 57, lr: 5.25e-03, grad_scale: 16.0 2023-11-20 09:08:12,792 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1025553.3333333334, ans=0.125 2023-11-20 09:08:16,382 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1025553.3333333334, ans=0.125 2023-11-20 09:08:18,832 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1025553.3333333334, ans=0.125 2023-11-20 09:08:29,309 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 153850 2023-11-20 09:08:47,966 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1025753.3333333334, ans=0.125 2023-11-20 09:09:10,959 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1025820.0, ans=0.1 2023-11-20 09:09:14,899 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 9600, loss[loss=0.0842, simple_loss=0.09998, pruned_loss=0.0212, audio_tagging_loss=0.01301, over 15336.00 frames. ], tot_loss[loss=0.08009, simple_loss=0.1001, pruned_loss=0.01973, audio_tagging_loss=0.01031, over 3041835.91 frames. ], batch size: 59, lr: 5.25e-03, grad_scale: 32.0 2023-11-20 09:09:32,988 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1025953.3333333334, ans=0.0 2023-11-20 09:09:34,181 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 153900 2023-11-20 09:09:36,959 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.26 vs. limit=15.0 2023-11-20 09:09:44,971 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.678e+01 7.953e+01 8.946e+01 9.937e+01 1.277e+02, threshold=1.789e+02, percent-clipped=0.0 2023-11-20 09:09:54,751 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1026086.6666666666, ans=0.0 2023-11-20 09:09:58,294 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1026086.6666666666, ans=0.125 2023-11-20 09:10:07,605 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1026153.3333333334, ans=0.1 2023-11-20 09:10:19,605 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 9650, loss[loss=0.07126, simple_loss=0.08833, pruned_loss=0.01384, audio_tagging_loss=0.01325, over 14690.00 frames. ], tot_loss[loss=0.08042, simple_loss=0.1003, pruned_loss=0.01987, audio_tagging_loss=0.01039, over 3044352.10 frames. ], batch size: 57, lr: 5.25e-03, grad_scale: 32.0 2023-11-20 09:10:28,566 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1026220.0, ans=0.125 2023-11-20 09:10:38,870 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 153950 2023-11-20 09:11:02,212 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.07 vs. limit=15.0 2023-11-20 09:11:04,028 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.96 vs. limit=15.0 2023-11-20 09:11:23,338 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 9700, loss[loss=0.08287, simple_loss=0.09285, pruned_loss=0.02321, audio_tagging_loss=0.01324, over 15272.00 frames. ], tot_loss[loss=0.08037, simple_loss=0.1006, pruned_loss=0.01996, audio_tagging_loss=0.01012, over 3052097.99 frames. ], batch size: 57, lr: 5.25e-03, grad_scale: 32.0 2023-11-20 09:11:43,181 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 154000 2023-11-20 09:11:54,853 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.479e+01 8.116e+01 8.846e+01 9.566e+01 1.154e+02, threshold=1.769e+02, percent-clipped=0.0 2023-11-20 09:11:57,115 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1026686.6666666666, ans=0.125 2023-11-20 09:12:02,244 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.09 vs. limit=15.0 2023-11-20 09:12:05,664 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1026753.3333333334, ans=0.2 2023-11-20 09:12:11,570 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1026753.3333333334, ans=0.125 2023-11-20 09:12:14,048 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1026820.0, ans=0.1 2023-11-20 09:12:19,813 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1026820.0, ans=0.125 2023-11-20 09:12:27,841 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 9750, loss[loss=0.104, simple_loss=0.1256, pruned_loss=0.03134, audio_tagging_loss=0.009894, over 15123.00 frames. ], tot_loss[loss=0.08029, simple_loss=0.1005, pruned_loss=0.01992, audio_tagging_loss=0.01009, over 3046998.50 frames. ], batch size: 56, lr: 5.25e-03, grad_scale: 32.0 2023-11-20 09:12:43,411 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.60 vs. limit=12.0 2023-11-20 09:12:45,407 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1026953.3333333334, ans=0.0 2023-11-20 09:12:48,289 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 154050 2023-11-20 09:13:16,047 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1027086.6666666666, ans=0.025 2023-11-20 09:13:30,628 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1027153.3333333334, ans=0.125 2023-11-20 09:13:32,828 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 9800, loss[loss=0.09355, simple_loss=0.1242, pruned_loss=0.0215, audio_tagging_loss=0.009933, over 16666.00 frames. ], tot_loss[loss=0.08054, simple_loss=0.1011, pruned_loss=0.02011, audio_tagging_loss=0.009901, over 3050232.24 frames. ], batch size: 60, lr: 5.25e-03, grad_scale: 32.0 2023-11-20 09:13:51,961 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 154100 2023-11-20 09:14:03,354 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.948e+01 8.195e+01 8.924e+01 9.693e+01 1.492e+02, threshold=1.785e+02, percent-clipped=0.0 2023-11-20 09:14:10,916 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.05 vs. limit=22.5 2023-11-20 09:14:12,716 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1027420.0, ans=0.2 2023-11-20 09:14:18,012 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.64 vs. limit=12.0 2023-11-20 09:14:20,033 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.71 vs. limit=22.5 2023-11-20 09:14:30,096 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1027486.6666666666, ans=0.1 2023-11-20 09:14:30,986 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 09:14:34,804 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1027486.6666666666, ans=0.1 2023-11-20 09:14:34,815 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1027486.6666666666, ans=0.1 2023-11-20 09:14:37,016 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 9850, loss[loss=0.06542, simple_loss=0.08331, pruned_loss=0.01429, audio_tagging_loss=0.009481, over 16246.00 frames. ], tot_loss[loss=0.08012, simple_loss=0.1006, pruned_loss=0.01991, audio_tagging_loss=0.009903, over 3053442.31 frames. ], batch size: 64, lr: 5.25e-03, grad_scale: 32.0 2023-11-20 09:14:38,954 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.96 vs. limit=12.0 2023-11-20 09:14:42,199 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1027553.3333333334, ans=0.2 2023-11-20 09:14:51,213 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1027620.0, ans=0.125 2023-11-20 09:14:56,570 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 154150 2023-11-20 09:15:04,171 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1027686.6666666666, ans=0.125 2023-11-20 09:15:13,423 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1027686.6666666666, ans=0.0 2023-11-20 09:15:13,851 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.03 vs. limit=15.0 2023-11-20 09:15:41,464 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 9900, loss[loss=0.09978, simple_loss=0.1292, pruned_loss=0.02346, audio_tagging_loss=0.01173, over 15600.00 frames. ], tot_loss[loss=0.0804, simple_loss=0.1008, pruned_loss=0.0201, audio_tagging_loss=0.009927, over 3051101.68 frames. ], batch size: 54, lr: 5.25e-03, grad_scale: 16.0 2023-11-20 09:15:57,978 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1027953.3333333334, ans=0.07 2023-11-20 09:16:01,316 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 154200 2023-11-20 09:16:05,516 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1027953.3333333334, ans=0.05 2023-11-20 09:16:14,639 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.724e+01 8.225e+01 8.765e+01 9.379e+01 1.572e+02, threshold=1.753e+02, percent-clipped=0.0 2023-11-20 09:16:20,000 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1028086.6666666666, ans=0.0 2023-11-20 09:16:27,024 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=1028086.6666666666, ans=15.0 2023-11-20 09:16:37,104 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1028153.3333333334, ans=0.125 2023-11-20 09:16:37,174 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1028153.3333333334, ans=0.125 2023-11-20 09:16:43,235 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1028153.3333333334, ans=0.0 2023-11-20 09:16:45,570 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1028220.0, ans=0.125 2023-11-20 09:16:47,245 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 9950, loss[loss=0.08818, simple_loss=0.1181, pruned_loss=0.02038, audio_tagging_loss=0.008735, over 15899.00 frames. ], tot_loss[loss=0.08034, simple_loss=0.1012, pruned_loss=0.01993, audio_tagging_loss=0.009826, over 3059687.02 frames. ], batch size: 56, lr: 5.25e-03, grad_scale: 16.0 2023-11-20 09:16:48,785 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1028220.0, ans=0.1 2023-11-20 09:16:56,116 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1028220.0, ans=0.0 2023-11-20 09:17:05,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1028286.6666666666, ans=0.0 2023-11-20 09:17:06,166 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 154250 2023-11-20 09:17:19,057 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1028353.3333333334, ans=0.125 2023-11-20 09:17:20,372 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1028353.3333333334, ans=0.2 2023-11-20 09:17:21,565 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1028353.3333333334, ans=0.125 2023-11-20 09:17:25,798 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1028420.0, ans=0.0 2023-11-20 09:17:40,199 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.45 vs. limit=15.0 2023-11-20 09:17:41,275 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1028486.6666666666, ans=0.125 2023-11-20 09:17:51,785 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 10000, loss[loss=0.09339, simple_loss=0.1132, pruned_loss=0.02773, audio_tagging_loss=0.009048, over 15587.00 frames. ], tot_loss[loss=0.07938, simple_loss=0.0998, pruned_loss=0.01957, audio_tagging_loss=0.009912, over 3066891.21 frames. ], batch size: 58, lr: 5.25e-03, grad_scale: 32.0 2023-11-20 09:17:55,760 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1028553.3333333334, ans=0.125 2023-11-20 09:18:10,884 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 154300 2023-11-20 09:18:23,636 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.484e+01 8.210e+01 8.752e+01 9.474e+01 1.370e+02, threshold=1.750e+02, percent-clipped=0.0 2023-11-20 09:18:33,357 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1028753.3333333334, ans=0.125 2023-11-20 09:18:40,682 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1028753.3333333334, ans=0.125 2023-11-20 09:18:56,655 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 10050, loss[loss=0.08496, simple_loss=0.1096, pruned_loss=0.02268, audio_tagging_loss=0.007466, over 14630.00 frames. ], tot_loss[loss=0.08016, simple_loss=0.1009, pruned_loss=0.01988, audio_tagging_loss=0.009842, over 3066838.22 frames. ], batch size: 56, lr: 5.25e-03, grad_scale: 32.0 2023-11-20 09:18:57,138 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.68 vs. limit=15.0 2023-11-20 09:18:58,157 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1028886.6666666666, ans=0.0 2023-11-20 09:18:58,700 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.09 vs. limit=22.5 2023-11-20 09:19:16,393 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 154350 2023-11-20 09:19:34,933 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.89 vs. limit=15.0 2023-11-20 09:19:39,550 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1029086.6666666666, ans=0.2 2023-11-20 09:19:41,472 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1029086.6666666666, ans=0.125 2023-11-20 09:20:01,393 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 10100, loss[loss=0.08035, simple_loss=0.1036, pruned_loss=0.02026, audio_tagging_loss=0.008312, over 15181.00 frames. ], tot_loss[loss=0.08039, simple_loss=0.101, pruned_loss=0.01997, audio_tagging_loss=0.009908, over 3068940.70 frames. ], batch size: 56, lr: 5.25e-03, grad_scale: 16.0 2023-11-20 09:20:04,880 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1029220.0, ans=0.125 2023-11-20 09:20:20,392 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 154400 2023-11-20 09:20:35,220 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.925e+01 8.287e+01 9.301e+01 1.019e+02 1.504e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-20 09:20:45,206 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.08 vs. limit=22.5 2023-11-20 09:20:53,502 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 09:21:05,871 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 10150, loss[loss=0.06293, simple_loss=0.07047, pruned_loss=0.01587, audio_tagging_loss=0.01183, over 15634.00 frames. ], tot_loss[loss=0.08032, simple_loss=0.1007, pruned_loss=0.01988, audio_tagging_loss=0.01007, over 3064651.47 frames. ], batch size: 64, lr: 5.24e-03, grad_scale: 16.0 2023-11-20 09:21:18,364 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1029620.0, ans=0.125 2023-11-20 09:21:25,613 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 154450 2023-11-20 09:21:36,048 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 09:21:51,668 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1029753.3333333334, ans=0.2 2023-11-20 09:22:07,422 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.72 vs. limit=15.0 2023-11-20 09:22:10,480 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 10200, loss[loss=0.07786, simple_loss=0.08995, pruned_loss=0.02147, audio_tagging_loss=0.01141, over 16046.00 frames. ], tot_loss[loss=0.0802, simple_loss=0.1003, pruned_loss=0.0199, audio_tagging_loss=0.01015, over 3072314.85 frames. ], batch size: 61, lr: 5.24e-03, grad_scale: 16.0 2023-11-20 09:22:24,905 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1029953.3333333334, ans=0.125 2023-11-20 09:22:29,560 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 154500 2023-11-20 09:22:34,904 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 09:22:40,317 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.66 vs. limit=12.0 2023-11-20 09:22:43,345 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.371e+01 8.118e+01 8.919e+01 9.960e+01 1.274e+02, threshold=1.784e+02, percent-clipped=0.0 2023-11-20 09:22:48,449 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1030086.6666666666, ans=0.125 2023-11-20 09:23:10,682 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1030153.3333333334, ans=0.0 2023-11-20 09:23:13,983 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 10250, loss[loss=0.0905, simple_loss=0.1121, pruned_loss=0.02551, audio_tagging_loss=0.008919, over 14949.00 frames. ], tot_loss[loss=0.08049, simple_loss=0.1006, pruned_loss=0.01996, audio_tagging_loss=0.01023, over 3067042.72 frames. ], batch size: 56, lr: 5.24e-03, grad_scale: 16.0 2023-11-20 09:23:33,213 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 154550 2023-11-20 09:23:41,917 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1030353.3333333334, ans=0.125 2023-11-20 09:23:45,752 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1030353.3333333334, ans=0.125 2023-11-20 09:23:49,422 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1030353.3333333334, ans=0.1 2023-11-20 09:24:19,142 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 10300, loss[loss=0.07541, simple_loss=0.09856, pruned_loss=0.01586, audio_tagging_loss=0.01027, over 15893.00 frames. ], tot_loss[loss=0.08012, simple_loss=0.09999, pruned_loss=0.01976, audio_tagging_loss=0.01037, over 3061922.01 frames. ], batch size: 58, lr: 5.24e-03, grad_scale: 16.0 2023-11-20 09:24:35,410 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1030620.0, ans=0.0 2023-11-20 09:24:38,886 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 154600 2023-11-20 09:24:40,897 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.69 vs. limit=15.0 2023-11-20 09:24:48,987 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.51 vs. limit=15.0 2023-11-20 09:24:51,132 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 09:24:52,499 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1030686.6666666666, ans=0.0 2023-11-20 09:24:53,305 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.249e+01 8.311e+01 8.875e+01 9.603e+01 1.202e+02, threshold=1.775e+02, percent-clipped=0.0 2023-11-20 09:24:59,584 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.70 vs. limit=6.0 2023-11-20 09:25:11,514 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1030820.0, ans=0.0 2023-11-20 09:25:17,665 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1030820.0, ans=0.125 2023-11-20 09:25:24,171 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 10350, loss[loss=0.0932, simple_loss=0.116, pruned_loss=0.0263, audio_tagging_loss=0.008895, over 15196.00 frames. ], tot_loss[loss=0.08025, simple_loss=0.1002, pruned_loss=0.01973, audio_tagging_loss=0.01043, over 3060083.11 frames. ], batch size: 58, lr: 5.24e-03, grad_scale: 16.0 2023-11-20 09:25:37,721 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1030953.3333333334, ans=0.0 2023-11-20 09:25:39,012 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1030953.3333333334, ans=0.125 2023-11-20 09:25:43,767 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 154650 2023-11-20 09:26:04,633 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1031086.6666666666, ans=0.125 2023-11-20 09:26:14,184 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.34 vs. limit=15.0 2023-11-20 09:26:15,157 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1031153.3333333334, ans=0.1 2023-11-20 09:26:23,050 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1031153.3333333334, ans=0.125 2023-11-20 09:26:25,432 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1031153.3333333334, ans=0.0 2023-11-20 09:26:25,654 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.21 vs. limit=10.0 2023-11-20 09:26:29,378 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 10400, loss[loss=0.07384, simple_loss=0.09571, pruned_loss=0.01294, audio_tagging_loss=0.01305, over 15358.00 frames. ], tot_loss[loss=0.08085, simple_loss=0.1011, pruned_loss=0.01994, audio_tagging_loss=0.01037, over 3063167.86 frames. ], batch size: 58, lr: 5.24e-03, grad_scale: 32.0 2023-11-20 09:26:48,747 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 154700 2023-11-20 09:27:03,086 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.158e+01 8.127e+01 8.781e+01 9.645e+01 1.274e+02, threshold=1.756e+02, percent-clipped=0.0 2023-11-20 09:27:14,551 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.16 vs. limit=15.0 2023-11-20 09:27:25,667 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 09:27:34,478 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 10450, loss[loss=0.06038, simple_loss=0.07335, pruned_loss=0.01001, audio_tagging_loss=0.0137, over 15618.00 frames. ], tot_loss[loss=0.08048, simple_loss=0.1004, pruned_loss=0.0199, audio_tagging_loss=0.01036, over 3059001.81 frames. ], batch size: 59, lr: 5.24e-03, grad_scale: 32.0 2023-11-20 09:27:47,893 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.51 vs. limit=15.0 2023-11-20 09:27:53,797 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 154750 2023-11-20 09:27:55,192 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1031620.0, ans=0.125 2023-11-20 09:27:55,220 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1031620.0, ans=0.1 2023-11-20 09:28:12,500 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1031753.3333333334, ans=0.07 2023-11-20 09:28:38,632 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 10500, loss[loss=0.08954, simple_loss=0.1055, pruned_loss=0.02559, audio_tagging_loss=0.01121, over 14915.00 frames. ], tot_loss[loss=0.07959, simple_loss=0.09913, pruned_loss=0.01978, audio_tagging_loss=0.01025, over 3057999.36 frames. ], batch size: 56, lr: 5.24e-03, grad_scale: 32.0 2023-11-20 09:28:42,183 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.74 vs. limit=15.0 2023-11-20 09:28:44,387 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1031886.6666666666, ans=0.125 2023-11-20 09:28:51,219 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 09:28:59,018 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 154800 2023-11-20 09:29:13,388 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.926e+01 8.208e+01 9.112e+01 1.062e+02 1.393e+02, threshold=1.822e+02, percent-clipped=0.0 2023-11-20 09:29:22,327 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1032086.6666666666, ans=0.1 2023-11-20 09:29:28,757 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.27 vs. limit=15.0 2023-11-20 09:29:45,118 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 10550, loss[loss=0.07703, simple_loss=0.09837, pruned_loss=0.02019, audio_tagging_loss=0.007655, over 15334.00 frames. ], tot_loss[loss=0.07874, simple_loss=0.09796, pruned_loss=0.01954, audio_tagging_loss=0.01022, over 3054632.49 frames. ], batch size: 57, lr: 5.24e-03, grad_scale: 32.0 2023-11-20 09:29:55,401 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1032220.0, ans=0.125 2023-11-20 09:29:55,424 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1032220.0, ans=0.125 2023-11-20 09:30:04,324 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 154850 2023-11-20 09:30:04,465 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1032286.6666666666, ans=0.125 2023-11-20 09:30:06,904 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1032286.6666666666, ans=0.125 2023-11-20 09:30:27,265 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1032420.0, ans=0.0 2023-11-20 09:30:27,333 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1032420.0, ans=0.0 2023-11-20 09:30:29,736 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1032420.0, ans=0.1 2023-11-20 09:30:38,836 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1032486.6666666666, ans=0.125 2023-11-20 09:30:49,032 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 10600, loss[loss=0.07659, simple_loss=0.1005, pruned_loss=0.01612, audio_tagging_loss=0.01022, over 14805.00 frames. ], tot_loss[loss=0.07984, simple_loss=0.09986, pruned_loss=0.01987, audio_tagging_loss=0.01004, over 3058293.66 frames. ], batch size: 55, lr: 5.24e-03, grad_scale: 32.0 2023-11-20 09:30:49,222 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1032553.3333333334, ans=0.1 2023-11-20 09:30:49,230 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1032553.3333333334, ans=0.1 2023-11-20 09:31:05,300 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1032620.0, ans=0.125 2023-11-20 09:31:08,143 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 154900 2023-11-20 09:31:11,871 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1032620.0, ans=0.125 2023-11-20 09:31:11,893 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1032620.0, ans=0.2 2023-11-20 09:31:21,992 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.038e+01 8.175e+01 8.791e+01 9.542e+01 1.185e+02, threshold=1.758e+02, percent-clipped=0.0 2023-11-20 09:31:24,694 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1032686.6666666666, ans=0.0 2023-11-20 09:31:52,002 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 10650, loss[loss=0.1155, simple_loss=0.151, pruned_loss=0.03243, audio_tagging_loss=0.007536, over 15048.00 frames. ], tot_loss[loss=0.08087, simple_loss=0.1014, pruned_loss=0.02021, audio_tagging_loss=0.009969, over 3049703.02 frames. ], batch size: 53, lr: 5.24e-03, grad_scale: 32.0 2023-11-20 09:32:12,493 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 154950 2023-11-20 09:32:40,069 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1033086.6666666666, ans=0.025 2023-11-20 09:32:54,585 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1033153.3333333334, ans=0.025 2023-11-20 09:32:56,689 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 10700, loss[loss=0.06277, simple_loss=0.08695, pruned_loss=0.01121, audio_tagging_loss=0.008085, over 14718.00 frames. ], tot_loss[loss=0.08025, simple_loss=0.1005, pruned_loss=0.01996, audio_tagging_loss=0.01004, over 3043724.49 frames. ], batch size: 54, lr: 5.24e-03, grad_scale: 32.0 2023-11-20 09:33:16,893 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 155000 2023-11-20 09:33:30,105 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.43 vs. limit=15.0 2023-11-20 09:33:30,434 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.016e+01 8.303e+01 8.834e+01 9.458e+01 1.451e+02, threshold=1.767e+02, percent-clipped=0.0 2023-11-20 09:33:30,855 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1033353.3333333334, ans=0.125 2023-11-20 09:33:57,144 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1033486.6666666666, ans=0.125 2023-11-20 09:33:58,421 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1033486.6666666666, ans=0.1 2023-11-20 09:34:02,490 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 10750, loss[loss=0.08059, simple_loss=0.1076, pruned_loss=0.01912, audio_tagging_loss=0.007674, over 14421.00 frames. ], tot_loss[loss=0.07951, simple_loss=0.09957, pruned_loss=0.01971, audio_tagging_loss=0.01001, over 3050007.52 frames. ], batch size: 56, lr: 5.23e-03, grad_scale: 32.0 2023-11-20 09:34:21,023 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 155050 2023-11-20 09:34:40,951 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1033753.3333333334, ans=0.2 2023-11-20 09:34:42,958 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1033753.3333333334, ans=0.1 2023-11-20 09:34:44,033 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1033753.3333333334, ans=0.0 2023-11-20 09:34:54,631 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.22 vs. limit=15.0 2023-11-20 09:35:06,248 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 10800, loss[loss=0.1037, simple_loss=0.1266, pruned_loss=0.03365, audio_tagging_loss=0.006796, over 16159.00 frames. ], tot_loss[loss=0.07949, simple_loss=0.09963, pruned_loss=0.01966, audio_tagging_loss=0.01001, over 3046881.55 frames. ], batch size: 60, lr: 5.23e-03, grad_scale: 32.0 2023-11-20 09:35:06,526 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1033886.6666666666, ans=0.125 2023-11-20 09:35:26,117 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 155100 2023-11-20 09:35:40,781 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.539e+01 8.046e+01 8.532e+01 9.175e+01 1.216e+02, threshold=1.706e+02, percent-clipped=0.0 2023-11-20 09:35:41,489 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.63 vs. limit=22.5 2023-11-20 09:35:44,902 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1034086.6666666666, ans=0.125 2023-11-20 09:35:46,173 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1034086.6666666666, ans=0.1 2023-11-20 09:35:52,606 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.24 vs. limit=15.0 2023-11-20 09:35:53,511 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1034086.6666666666, ans=0.0 2023-11-20 09:36:11,284 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 10850, loss[loss=0.08612, simple_loss=0.1023, pruned_loss=0.026, audio_tagging_loss=0.008956, over 14262.00 frames. ], tot_loss[loss=0.08012, simple_loss=0.1006, pruned_loss=0.01989, audio_tagging_loss=0.009929, over 3056261.21 frames. ], batch size: 53, lr: 5.23e-03, grad_scale: 32.0 2023-11-20 09:36:32,061 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 155150 2023-11-20 09:36:45,809 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1034353.3333333334, ans=0.125 2023-11-20 09:36:57,565 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1034420.0, ans=0.125 2023-11-20 09:37:05,509 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1034486.6666666666, ans=0.125 2023-11-20 09:37:12,675 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 09:37:16,335 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 10900, loss[loss=0.08297, simple_loss=0.1035, pruned_loss=0.02203, audio_tagging_loss=0.009203, over 14202.00 frames. ], tot_loss[loss=0.08006, simple_loss=0.1004, pruned_loss=0.01992, audio_tagging_loss=0.009946, over 3050502.01 frames. ], batch size: 54, lr: 5.23e-03, grad_scale: 32.0 2023-11-20 09:37:22,550 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.00 vs. limit=15.0 2023-11-20 09:37:30,981 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1034620.0, ans=0.1 2023-11-20 09:37:35,734 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 155200 2023-11-20 09:37:41,884 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1034686.6666666666, ans=0.5 2023-11-20 09:37:50,144 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.033e+01 8.200e+01 8.772e+01 9.722e+01 1.243e+02, threshold=1.754e+02, percent-clipped=0.0 2023-11-20 09:38:20,707 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 10950, loss[loss=0.05995, simple_loss=0.06901, pruned_loss=0.01334, audio_tagging_loss=0.0121, over 14508.00 frames. ], tot_loss[loss=0.07966, simple_loss=0.09989, pruned_loss=0.01962, audio_tagging_loss=0.01009, over 3051744.78 frames. ], batch size: 56, lr: 5.23e-03, grad_scale: 32.0 2023-11-20 09:38:27,303 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1034886.6666666666, ans=0.125 2023-11-20 09:38:38,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1034953.3333333334, ans=0.125 2023-11-20 09:38:39,888 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 155250 2023-11-20 09:38:41,445 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1034953.3333333334, ans=0.1 2023-11-20 09:38:49,243 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1035020.0, ans=0.09899494936611666 2023-11-20 09:39:24,992 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 11000, loss[loss=0.06793, simple_loss=0.08807, pruned_loss=0.01209, audio_tagging_loss=0.01181, over 15962.00 frames. ], tot_loss[loss=0.08026, simple_loss=0.1007, pruned_loss=0.01977, audio_tagging_loss=0.01013, over 3057341.70 frames. ], batch size: 59, lr: 5.23e-03, grad_scale: 32.0 2023-11-20 09:39:34,603 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1035220.0, ans=0.125 2023-11-20 09:39:35,547 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 09:39:44,324 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 155300 2023-11-20 09:39:51,793 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1035353.3333333334, ans=0.0 2023-11-20 09:39:56,769 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1035353.3333333334, ans=0.125 2023-11-20 09:39:59,019 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.020e+01 8.120e+01 8.817e+01 9.505e+01 1.234e+02, threshold=1.763e+02, percent-clipped=0.0 2023-11-20 09:40:15,101 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.97 vs. limit=15.0 2023-11-20 09:40:16,033 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1035486.6666666666, ans=0.1 2023-11-20 09:40:16,088 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1035486.6666666666, ans=0.1 2023-11-20 09:40:29,863 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 11050, loss[loss=0.1086, simple_loss=0.143, pruned_loss=0.02638, audio_tagging_loss=0.01074, over 15604.00 frames. ], tot_loss[loss=0.08035, simple_loss=0.1005, pruned_loss=0.01987, audio_tagging_loss=0.01023, over 3057740.03 frames. ], batch size: 56, lr: 5.23e-03, grad_scale: 16.0 2023-11-20 09:40:41,751 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.82 vs. limit=22.5 2023-11-20 09:40:47,784 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1035620.0, ans=0.125 2023-11-20 09:40:49,955 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 155350 2023-11-20 09:41:06,437 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1035686.6666666666, ans=0.125 2023-11-20 09:41:12,072 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 09:41:34,885 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 11100, loss[loss=0.1025, simple_loss=0.1353, pruned_loss=0.02796, audio_tagging_loss=0.006891, over 15120.00 frames. ], tot_loss[loss=0.08083, simple_loss=0.1012, pruned_loss=0.01995, audio_tagging_loss=0.01026, over 3050501.49 frames. ], batch size: 56, lr: 5.23e-03, grad_scale: 16.0 2023-11-20 09:41:35,321 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1035886.6666666666, ans=0.125 2023-11-20 09:41:50,702 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1035953.3333333334, ans=0.1 2023-11-20 09:41:54,074 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 155400 2023-11-20 09:42:09,687 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.838e+01 8.254e+01 9.028e+01 9.759e+01 1.162e+02, threshold=1.806e+02, percent-clipped=0.0 2023-11-20 09:42:25,145 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.29 vs. limit=15.0 2023-11-20 09:42:39,739 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 11150, loss[loss=0.1033, simple_loss=0.1298, pruned_loss=0.02713, audio_tagging_loss=0.01128, over 16338.00 frames. ], tot_loss[loss=0.08103, simple_loss=0.1012, pruned_loss=0.02001, audio_tagging_loss=0.0104, over 3050136.51 frames. ], batch size: 58, lr: 5.23e-03, grad_scale: 16.0 2023-11-20 09:42:40,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1036220.0, ans=0.0 2023-11-20 09:42:58,716 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 155450 2023-11-20 09:43:06,868 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 09:43:44,270 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 11200, loss[loss=0.08266, simple_loss=0.1071, pruned_loss=0.01847, audio_tagging_loss=0.01066, over 15370.00 frames. ], tot_loss[loss=0.08026, simple_loss=0.1005, pruned_loss=0.01954, audio_tagging_loss=0.01047, over 3046445.15 frames. ], batch size: 56, lr: 5.23e-03, grad_scale: 32.0 2023-11-20 09:43:44,576 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1036553.3333333334, ans=0.125 2023-11-20 09:43:47,160 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1036553.3333333334, ans=0.5 2023-11-20 09:44:01,111 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1036620.0, ans=0.0 2023-11-20 09:44:03,906 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 155500 2023-11-20 09:44:11,588 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1036686.6666666666, ans=0.0 2023-11-20 09:44:19,033 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.839e+01 8.019e+01 8.498e+01 9.323e+01 1.224e+02, threshold=1.700e+02, percent-clipped=0.0 2023-11-20 09:44:48,550 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 11250, loss[loss=0.0704, simple_loss=0.08394, pruned_loss=0.01695, audio_tagging_loss=0.01147, over 16342.00 frames. ], tot_loss[loss=0.07996, simple_loss=0.09999, pruned_loss=0.01949, audio_tagging_loss=0.01047, over 3048168.69 frames. ], batch size: 61, lr: 5.23e-03, grad_scale: 32.0 2023-11-20 09:44:51,539 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.52 vs. limit=15.0 2023-11-20 09:45:04,647 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1036953.3333333334, ans=0.125 2023-11-20 09:45:08,229 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 155550 2023-11-20 09:45:20,197 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1037020.0, ans=0.125 2023-11-20 09:45:53,989 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 11300, loss[loss=0.07662, simple_loss=0.09778, pruned_loss=0.01708, audio_tagging_loss=0.01065, over 14837.00 frames. ], tot_loss[loss=0.07925, simple_loss=0.09922, pruned_loss=0.01931, audio_tagging_loss=0.01033, over 3044571.83 frames. ], batch size: 53, lr: 5.23e-03, grad_scale: 32.0 2023-11-20 09:45:56,776 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1037220.0, ans=0.09899494936611666 2023-11-20 09:46:13,178 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 155600 2023-11-20 09:46:14,915 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1037286.6666666666, ans=0.125 2023-11-20 09:46:17,707 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.64 vs. limit=15.0 2023-11-20 09:46:28,934 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.896e+01 8.236e+01 9.129e+01 9.698e+01 1.564e+02, threshold=1.826e+02, percent-clipped=0.0 2023-11-20 09:46:38,309 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1037420.0, ans=0.1 2023-11-20 09:46:59,332 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 11350, loss[loss=0.09182, simple_loss=0.1135, pruned_loss=0.02404, audio_tagging_loss=0.01104, over 15669.00 frames. ], tot_loss[loss=0.07942, simple_loss=0.09957, pruned_loss=0.0194, audio_tagging_loss=0.01024, over 3049638.98 frames. ], batch size: 60, lr: 5.22e-03, grad_scale: 32.0 2023-11-20 09:47:18,676 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 155650 2023-11-20 09:47:29,291 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1037686.6666666666, ans=0.125 2023-11-20 09:47:53,749 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.20 vs. limit=6.0 2023-11-20 09:47:54,439 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1037820.0, ans=0.125 2023-11-20 09:47:58,551 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1037820.0, ans=0.2 2023-11-20 09:48:00,032 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1037820.0, ans=0.125 2023-11-20 09:48:04,515 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 11400, loss[loss=0.08743, simple_loss=0.1025, pruned_loss=0.02433, audio_tagging_loss=0.01185, over 14938.00 frames. ], tot_loss[loss=0.08039, simple_loss=0.1008, pruned_loss=0.01989, audio_tagging_loss=0.01011, over 3050036.06 frames. ], batch size: 58, lr: 5.22e-03, grad_scale: 32.0 2023-11-20 09:48:24,534 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 155700 2023-11-20 09:48:39,689 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.071e+01 7.956e+01 8.738e+01 9.892e+01 2.201e+02, threshold=1.748e+02, percent-clipped=1.0 2023-11-20 09:48:51,594 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1038086.6666666666, ans=10.0 2023-11-20 09:49:06,430 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1038153.3333333334, ans=0.0 2023-11-20 09:49:09,130 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 11450, loss[loss=0.09615, simple_loss=0.1284, pruned_loss=0.02435, audio_tagging_loss=0.007577, over 15826.00 frames. ], tot_loss[loss=0.08034, simple_loss=0.1008, pruned_loss=0.01984, audio_tagging_loss=0.01011, over 3058090.63 frames. ], batch size: 57, lr: 5.22e-03, grad_scale: 32.0 2023-11-20 09:49:28,746 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 155750 2023-11-20 09:49:32,478 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1038286.6666666666, ans=0.2 2023-11-20 09:49:51,942 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.14 vs. limit=22.5 2023-11-20 09:49:55,308 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1038420.0, ans=0.125 2023-11-20 09:50:03,922 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1038486.6666666666, ans=0.2 2023-11-20 09:50:13,903 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 11500, loss[loss=0.07412, simple_loss=0.09579, pruned_loss=0.01575, audio_tagging_loss=0.01048, over 15820.00 frames. ], tot_loss[loss=0.08004, simple_loss=0.1005, pruned_loss=0.01968, audio_tagging_loss=0.0101, over 3054345.58 frames. ], batch size: 62, lr: 5.22e-03, grad_scale: 32.0 2023-11-20 09:50:32,939 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 155800 2023-11-20 09:50:47,689 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.982e+01 8.235e+01 8.586e+01 9.090e+01 1.242e+02, threshold=1.717e+02, percent-clipped=0.0 2023-11-20 09:51:18,145 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 11550, loss[loss=0.07768, simple_loss=0.09653, pruned_loss=0.02209, audio_tagging_loss=0.007329, over 15274.00 frames. ], tot_loss[loss=0.07998, simple_loss=0.1002, pruned_loss=0.01988, audio_tagging_loss=0.01001, over 3050990.22 frames. ], batch size: 57, lr: 5.22e-03, grad_scale: 32.0 2023-11-20 09:51:22,026 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1038886.6666666666, ans=0.125 2023-11-20 09:51:23,241 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1038886.6666666666, ans=0.2 2023-11-20 09:51:24,439 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1038886.6666666666, ans=0.125 2023-11-20 09:51:36,556 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 155850 2023-11-20 09:51:43,950 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1039020.0, ans=0.2 2023-11-20 09:51:46,468 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1039020.0, ans=0.125 2023-11-20 09:51:53,023 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1039020.0, ans=0.0 2023-11-20 09:51:53,049 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1039020.0, ans=0.125 2023-11-20 09:51:56,526 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 09:51:57,925 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1039086.6666666666, ans=0.2 2023-11-20 09:52:14,446 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1039153.3333333334, ans=0.1 2023-11-20 09:52:15,662 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1039153.3333333334, ans=0.125 2023-11-20 09:52:21,321 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 11600, loss[loss=0.0902, simple_loss=0.1221, pruned_loss=0.02102, audio_tagging_loss=0.008142, over 15612.00 frames. ], tot_loss[loss=0.0802, simple_loss=0.1004, pruned_loss=0.02, audio_tagging_loss=0.009984, over 3049664.71 frames. ], batch size: 57, lr: 5.22e-03, grad_scale: 32.0 2023-11-20 09:52:37,539 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1039286.6666666666, ans=0.0 2023-11-20 09:52:41,569 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 155900 2023-11-20 09:52:46,995 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.28 vs. limit=15.0 2023-11-20 09:52:55,619 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1039353.3333333334, ans=0.1 2023-11-20 09:52:56,403 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.551e+01 8.203e+01 8.943e+01 9.744e+01 1.251e+02, threshold=1.789e+02, percent-clipped=0.0 2023-11-20 09:52:56,876 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1039353.3333333334, ans=0.2 2023-11-20 09:53:25,681 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 11650, loss[loss=0.07163, simple_loss=0.08836, pruned_loss=0.01798, audio_tagging_loss=0.009465, over 14210.00 frames. ], tot_loss[loss=0.08087, simple_loss=0.1013, pruned_loss=0.0203, audio_tagging_loss=0.009942, over 3050015.92 frames. ], batch size: 57, lr: 5.22e-03, grad_scale: 32.0 2023-11-20 09:53:45,317 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 155950 2023-11-20 09:53:59,218 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1039686.6666666666, ans=0.125 2023-11-20 09:54:00,258 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 09:54:20,365 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1039820.0, ans=0.125 2023-11-20 09:54:30,201 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.83 vs. limit=10.0 2023-11-20 09:54:30,840 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 11700, loss[loss=0.08519, simple_loss=0.1126, pruned_loss=0.02033, audio_tagging_loss=0.008583, over 16161.00 frames. ], tot_loss[loss=0.08103, simple_loss=0.1015, pruned_loss=0.0203, audio_tagging_loss=0.009965, over 3048653.11 frames. ], batch size: 60, lr: 5.22e-03, grad_scale: 32.0 2023-11-20 09:54:31,265 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1039886.6666666666, ans=0.1 2023-11-20 09:54:45,908 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 09:54:49,305 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 156000 2023-11-20 09:54:50,793 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-156000.pt 2023-11-20 09:55:05,453 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1040020.0, ans=10.0 2023-11-20 09:55:09,544 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.408e+01 8.314e+01 9.144e+01 1.029e+02 1.424e+02, threshold=1.829e+02, percent-clipped=0.0 2023-11-20 09:55:13,566 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1040086.6666666666, ans=0.07 2023-11-20 09:55:34,319 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=14.02 vs. limit=15.0 2023-11-20 09:55:38,346 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 11750, loss[loss=0.0627, simple_loss=0.07732, pruned_loss=0.01307, audio_tagging_loss=0.01098, over 15074.00 frames. ], tot_loss[loss=0.08119, simple_loss=0.1016, pruned_loss=0.02031, audio_tagging_loss=0.0101, over 3052938.09 frames. ], batch size: 56, lr: 5.22e-03, grad_scale: 32.0 2023-11-20 09:55:44,725 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1040220.0, ans=0.125 2023-11-20 09:55:56,920 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1040286.6666666666, ans=0.0 2023-11-20 09:55:58,488 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 156050 2023-11-20 09:56:01,204 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 09:56:17,816 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1040420.0, ans=0.125 2023-11-20 09:56:19,634 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.06 vs. limit=10.0 2023-11-20 09:56:42,569 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 11800, loss[loss=0.07768, simple_loss=0.09499, pruned_loss=0.01894, audio_tagging_loss=0.01124, over 14637.00 frames. ], tot_loss[loss=0.0809, simple_loss=0.1011, pruned_loss=0.02018, audio_tagging_loss=0.01016, over 3050459.38 frames. ], batch size: 55, lr: 5.22e-03, grad_scale: 32.0 2023-11-20 09:56:48,834 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1040553.3333333334, ans=0.2 2023-11-20 09:57:02,653 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 156100 2023-11-20 09:57:16,267 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1040686.6666666666, ans=0.0 2023-11-20 09:57:17,067 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.890e+01 8.238e+01 8.781e+01 9.493e+01 1.182e+02, threshold=1.756e+02, percent-clipped=0.0 2023-11-20 09:57:27,678 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1040753.3333333334, ans=0.1 2023-11-20 09:57:46,286 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 11850, loss[loss=0.07636, simple_loss=0.09216, pruned_loss=0.01646, audio_tagging_loss=0.01382, over 15098.00 frames. ], tot_loss[loss=0.08094, simple_loss=0.1014, pruned_loss=0.02007, audio_tagging_loss=0.01015, over 3051822.98 frames. ], batch size: 56, lr: 5.22e-03, grad_scale: 32.0 2023-11-20 09:58:01,961 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1040953.3333333334, ans=0.2 2023-11-20 09:58:05,474 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 156150 2023-11-20 09:58:50,153 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 11900, loss[loss=0.06091, simple_loss=0.08641, pruned_loss=0.008318, audio_tagging_loss=0.009387, over 14280.00 frames. ], tot_loss[loss=0.08064, simple_loss=0.101, pruned_loss=0.01991, audio_tagging_loss=0.01021, over 3048949.32 frames. ], batch size: 55, lr: 5.21e-03, grad_scale: 32.0 2023-11-20 09:59:09,365 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 156200 2023-11-20 09:59:13,731 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1041286.6666666666, ans=0.125 2023-11-20 09:59:18,035 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1041353.3333333334, ans=0.07 2023-11-20 09:59:25,438 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.397e+01 8.078e+01 8.558e+01 9.294e+01 1.166e+02, threshold=1.712e+02, percent-clipped=0.0 2023-11-20 09:59:41,921 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.76 vs. limit=15.0 2023-11-20 09:59:54,106 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 11950, loss[loss=0.07743, simple_loss=0.1032, pruned_loss=0.01518, audio_tagging_loss=0.01068, over 14636.00 frames. ], tot_loss[loss=0.08004, simple_loss=0.0999, pruned_loss=0.01974, audio_tagging_loss=0.01035, over 3046286.91 frames. ], batch size: 54, lr: 5.21e-03, grad_scale: 32.0 2023-11-20 09:59:54,605 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.51 vs. limit=22.5 2023-11-20 10:00:03,429 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1041553.3333333334, ans=0.125 2023-11-20 10:00:14,327 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 156250 2023-11-20 10:00:22,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1041686.6666666666, ans=0.0 2023-11-20 10:00:25,776 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1041686.6666666666, ans=0.125 2023-11-20 10:00:29,473 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1041686.6666666666, ans=0.0 2023-11-20 10:00:42,283 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1041753.3333333334, ans=0.0 2023-11-20 10:00:56,377 INFO [train_asr.py:1262] (0/4) Epoch 13, batch 12000, loss[loss=0.08012, simple_loss=0.1065, pruned_loss=0.01848, audio_tagging_loss=0.008409, over 14253.00 frames. ], tot_loss[loss=0.08063, simple_loss=0.1007, pruned_loss=0.01997, audio_tagging_loss=0.01032, over 3044919.43 frames. ], batch size: 54, lr: 5.21e-03, grad_scale: 32.0 2023-11-20 10:00:56,381 INFO [train_asr.py:1285] (0/4) Computing validation loss 2023-11-20 10:01:36,803 INFO [train_asr.py:1294] (0/4) Epoch 13, validation: loss=0.0624, simple_loss=0.05383, pruned_loss=0.00582, audio_tagging_loss=0.02967, over 4681554.00 frames. 2023-11-20 10:01:36,804 INFO [train_asr.py:1295] (0/4) Maximum memory allocated so far is 25925MB 2023-11-20 10:01:38,165 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1041886.6666666666, ans=0.0 2023-11-20 10:01:41,579 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=1041886.6666666666, ans=0.05 2023-11-20 10:01:54,242 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 156300 2023-11-20 10:01:58,932 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1042020.0, ans=0.2 2023-11-20 10:02:05,543 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/epoch-13.pt 2023-11-20 10:02:41,396 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 0, loss[loss=0.09333, simple_loss=0.09992, pruned_loss=0.02013, audio_tagging_loss=0.02324, over 16029.00 frames. ], tot_loss[loss=0.09333, simple_loss=0.09992, pruned_loss=0.02013, audio_tagging_loss=0.02324, over 16029.00 frames. ], batch size: 61, lr: 5.02e-03, grad_scale: 32.0 2023-11-20 10:02:41,399 INFO [train_asr.py:1285] (0/4) Computing validation loss 2023-11-20 10:03:13,787 INFO [zipformer.py:1873] (0/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([3.9975, 3.1842, 2.9642, 3.1665, 3.4850, 2.7243, 3.4087, 2.6666], device='cuda:0') 2023-11-20 10:03:18,486 INFO [train_asr.py:1294] (0/4) Epoch 14, validation: loss=0.0621, simple_loss=0.05383, pruned_loss=0.005845, audio_tagging_loss=0.02934, over 4681554.00 frames. 2023-11-20 10:03:18,487 INFO [train_asr.py:1295] (0/4) Maximum memory allocated so far is 25925MB 2023-11-20 10:03:22,230 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.506e+01 8.326e+01 8.983e+01 9.877e+01 1.645e+02, threshold=1.797e+02, percent-clipped=0.0 2023-11-20 10:03:44,144 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.20 vs. limit=15.0 2023-11-20 10:04:03,100 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1042246.6666666666, ans=0.125 2023-11-20 10:04:12,146 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 156350 2023-11-20 10:04:13,682 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1042313.3333333334, ans=0.1 2023-11-20 10:04:20,955 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1042313.3333333334, ans=0.125 2023-11-20 10:04:23,751 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 50, loss[loss=0.08577, simple_loss=0.1039, pruned_loss=0.01805, audio_tagging_loss=0.01575, over 15088.00 frames. ], tot_loss[loss=0.08942, simple_loss=0.1001, pruned_loss=0.01979, audio_tagging_loss=0.01956, over 687131.80 frames. ], batch size: 59, lr: 5.02e-03, grad_scale: 32.0 2023-11-20 10:04:28,105 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.20 vs. limit=15.0 2023-11-20 10:04:39,611 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.47 vs. limit=15.0 2023-11-20 10:04:42,700 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1042446.6666666666, ans=0.125 2023-11-20 10:04:42,797 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1042446.6666666666, ans=0.2 2023-11-20 10:04:54,284 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.41 vs. limit=6.0 2023-11-20 10:04:58,412 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.82 vs. limit=10.0 2023-11-20 10:04:59,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1042513.3333333334, ans=0.0 2023-11-20 10:05:03,071 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1042580.0, ans=0.125 2023-11-20 10:05:16,924 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 156400 2023-11-20 10:05:19,048 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.55 vs. limit=15.0 2023-11-20 10:05:24,180 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1042646.6666666666, ans=0.125 2023-11-20 10:05:29,531 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 100, loss[loss=0.09063, simple_loss=0.1055, pruned_loss=0.02168, audio_tagging_loss=0.01619, over 15429.00 frames. ], tot_loss[loss=0.08706, simple_loss=0.09863, pruned_loss=0.01892, audio_tagging_loss=0.01882, over 1203900.45 frames. ], batch size: 55, lr: 5.02e-03, grad_scale: 32.0 2023-11-20 10:05:33,222 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.056e+01 8.681e+01 9.274e+01 1.011e+02 1.384e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-20 10:06:19,003 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.85 vs. limit=15.0 2023-11-20 10:06:20,903 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1042980.0, ans=0.125 2023-11-20 10:06:21,069 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1042980.0, ans=0.5 2023-11-20 10:06:22,013 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 156450 2023-11-20 10:06:33,104 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 150, loss[loss=0.09268, simple_loss=0.1224, pruned_loss=0.02096, audio_tagging_loss=0.0105, over 15579.00 frames. ], tot_loss[loss=0.08582, simple_loss=0.09968, pruned_loss=0.0192, audio_tagging_loss=0.01678, over 1611666.94 frames. ], batch size: 55, lr: 5.02e-03, grad_scale: 32.0 2023-11-20 10:07:07,033 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1043180.0, ans=0.0 2023-11-20 10:07:09,674 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1043180.0, ans=0.0 2023-11-20 10:07:18,211 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1043246.6666666666, ans=0.125 2023-11-20 10:07:27,251 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 156500 2023-11-20 10:07:37,299 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1043380.0, ans=0.125 2023-11-20 10:07:38,218 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 200, loss[loss=0.07034, simple_loss=0.0802, pruned_loss=0.01667, audio_tagging_loss=0.01357, over 13919.00 frames. ], tot_loss[loss=0.08414, simple_loss=0.1001, pruned_loss=0.01946, audio_tagging_loss=0.01464, over 1927655.00 frames. ], batch size: 55, lr: 5.02e-03, grad_scale: 32.0 2023-11-20 10:07:39,824 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1043380.0, ans=0.125 2023-11-20 10:07:42,548 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.035e+01 8.224e+01 9.022e+01 9.818e+01 1.305e+02, threshold=1.804e+02, percent-clipped=0.0 2023-11-20 10:08:14,747 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.73 vs. limit=15.0 2023-11-20 10:08:19,450 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1043580.0, ans=0.125 2023-11-20 10:08:23,028 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1043580.0, ans=0.125 2023-11-20 10:08:24,605 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.00 vs. limit=15.0 2023-11-20 10:08:29,161 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1043646.6666666666, ans=0.125 2023-11-20 10:08:30,224 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 10:08:31,948 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 156550 2023-11-20 10:08:43,590 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 250, loss[loss=0.07384, simple_loss=0.08482, pruned_loss=0.01934, audio_tagging_loss=0.01209, over 14339.00 frames. ], tot_loss[loss=0.08325, simple_loss=0.1007, pruned_loss=0.01972, audio_tagging_loss=0.01319, over 2177662.45 frames. ], batch size: 54, lr: 5.02e-03, grad_scale: 16.0 2023-11-20 10:09:09,006 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1043846.6666666666, ans=0.125 2023-11-20 10:09:16,560 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1043846.6666666666, ans=0.0 2023-11-20 10:09:30,360 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1043913.3333333334, ans=0.0 2023-11-20 10:09:30,408 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1043913.3333333334, ans=0.2 2023-11-20 10:09:35,213 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.85 vs. limit=22.5 2023-11-20 10:09:37,005 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 156600 2023-11-20 10:09:37,115 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1043980.0, ans=0.0 2023-11-20 10:09:48,897 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 300, loss[loss=0.07177, simple_loss=0.09071, pruned_loss=0.01779, audio_tagging_loss=0.008625, over 15362.00 frames. ], tot_loss[loss=0.0822, simple_loss=0.1008, pruned_loss=0.01962, audio_tagging_loss=0.01216, over 2371982.41 frames. ], batch size: 58, lr: 5.02e-03, grad_scale: 16.0 2023-11-20 10:09:54,294 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.950e+01 8.223e+01 8.932e+01 9.585e+01 1.475e+02, threshold=1.786e+02, percent-clipped=0.0 2023-11-20 10:10:03,414 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1044113.3333333334, ans=0.125 2023-11-20 10:10:16,514 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.81 vs. limit=22.5 2023-11-20 10:10:42,191 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 156650 2023-11-20 10:10:48,112 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1044313.3333333334, ans=0.125 2023-11-20 10:10:53,858 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 350, loss[loss=0.06029, simple_loss=0.08074, pruned_loss=0.01202, audio_tagging_loss=0.007899, over 16664.00 frames. ], tot_loss[loss=0.08162, simple_loss=0.101, pruned_loss=0.01959, audio_tagging_loss=0.0115, over 2521521.95 frames. ], batch size: 63, lr: 5.02e-03, grad_scale: 4.0 2023-11-20 10:11:07,361 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.65 vs. limit=15.0 2023-11-20 10:11:09,350 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1044446.6666666666, ans=0.07 2023-11-20 10:11:16,132 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1044446.6666666666, ans=0.125 2023-11-20 10:11:20,593 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1044513.3333333334, ans=0.125 2023-11-20 10:11:46,514 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 156700 2023-11-20 10:11:58,260 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 400, loss[loss=0.0893, simple_loss=0.1213, pruned_loss=0.01992, audio_tagging_loss=0.008744, over 14454.00 frames. ], tot_loss[loss=0.08104, simple_loss=0.1011, pruned_loss=0.01944, audio_tagging_loss=0.01104, over 2639806.20 frames. ], batch size: 55, lr: 5.02e-03, grad_scale: 8.0 2023-11-20 10:12:01,874 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1044713.3333333334, ans=0.125 2023-11-20 10:12:06,229 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.716e+01 8.326e+01 8.879e+01 9.512e+01 2.019e+02, threshold=1.776e+02, percent-clipped=1.0 2023-11-20 10:12:17,018 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1044780.0, ans=0.0 2023-11-20 10:12:17,045 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1044780.0, ans=0.0 2023-11-20 10:12:37,288 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1044913.3333333334, ans=0.0 2023-11-20 10:12:47,935 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1044913.3333333334, ans=0.125 2023-11-20 10:12:52,197 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 156750 2023-11-20 10:12:53,718 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1044980.0, ans=0.125 2023-11-20 10:13:03,933 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 450, loss[loss=0.07615, simple_loss=0.09143, pruned_loss=0.02077, audio_tagging_loss=0.00967, over 14451.00 frames. ], tot_loss[loss=0.08037, simple_loss=0.1005, pruned_loss=0.0193, audio_tagging_loss=0.01082, over 2729512.82 frames. ], batch size: 54, lr: 5.02e-03, grad_scale: 8.0 2023-11-20 10:13:11,096 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1045046.6666666666, ans=0.0 2023-11-20 10:13:17,651 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.04 vs. limit=6.0 2023-11-20 10:13:48,472 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=1045246.6666666666, ans=0.5 2023-11-20 10:13:51,218 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1045246.6666666666, ans=0.125 2023-11-20 10:13:57,633 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 156800 2023-11-20 10:13:59,354 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1045313.3333333334, ans=0.125 2023-11-20 10:14:02,100 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1045313.3333333334, ans=0.0 2023-11-20 10:14:09,379 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 500, loss[loss=0.0642, simple_loss=0.07471, pruned_loss=0.01666, audio_tagging_loss=0.01018, over 14853.00 frames. ], tot_loss[loss=0.07979, simple_loss=0.09977, pruned_loss=0.01931, audio_tagging_loss=0.0106, over 2800698.90 frames. ], batch size: 59, lr: 5.02e-03, grad_scale: 8.0 2023-11-20 10:14:16,736 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.847e+01 8.307e+01 8.961e+01 9.765e+01 1.460e+02, threshold=1.792e+02, percent-clipped=0.0 2023-11-20 10:14:36,764 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1045513.3333333334, ans=0.0 2023-11-20 10:14:39,189 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1045513.3333333334, ans=0.2 2023-11-20 10:14:50,549 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1045580.0, ans=0.1 2023-11-20 10:14:54,238 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1045580.0, ans=0.1 2023-11-20 10:15:02,737 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 156850 2023-11-20 10:15:14,475 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 550, loss[loss=0.06839, simple_loss=0.0786, pruned_loss=0.01776, audio_tagging_loss=0.01133, over 15925.00 frames. ], tot_loss[loss=0.08023, simple_loss=0.1004, pruned_loss=0.01954, audio_tagging_loss=0.0105, over 2853488.46 frames. ], batch size: 61, lr: 5.01e-03, grad_scale: 8.0 2023-11-20 10:15:21,746 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1045713.3333333334, ans=0.125 2023-11-20 10:15:21,859 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1045713.3333333334, ans=0.025 2023-11-20 10:15:22,866 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 10:15:30,622 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.78 vs. limit=15.0 2023-11-20 10:15:52,864 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1045913.3333333334, ans=0.125 2023-11-20 10:16:08,717 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 156900 2023-11-20 10:16:15,309 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.11 vs. limit=15.0 2023-11-20 10:16:19,738 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 600, loss[loss=0.07552, simple_loss=0.09383, pruned_loss=0.01981, audio_tagging_loss=0.008794, over 15195.00 frames. ], tot_loss[loss=0.07971, simple_loss=0.09976, pruned_loss=0.01941, audio_tagging_loss=0.01042, over 2895116.08 frames. ], batch size: 56, lr: 5.01e-03, grad_scale: 8.0 2023-11-20 10:16:20,106 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1046046.6666666666, ans=0.0 2023-11-20 10:16:22,947 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.39 vs. limit=15.0 2023-11-20 10:16:23,828 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1046046.6666666666, ans=0.125 2023-11-20 10:16:27,236 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.847e+01 7.933e+01 8.592e+01 9.443e+01 1.249e+02, threshold=1.718e+02, percent-clipped=0.0 2023-11-20 10:16:31,098 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1046046.6666666666, ans=0.2 2023-11-20 10:16:41,225 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1046113.3333333334, ans=0.125 2023-11-20 10:16:42,578 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1046113.3333333334, ans=0.125 2023-11-20 10:16:42,845 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.73 vs. limit=15.0 2023-11-20 10:16:54,696 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.97 vs. limit=10.0 2023-11-20 10:17:05,926 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1046246.6666666666, ans=0.125 2023-11-20 10:17:13,201 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 156950 2023-11-20 10:17:24,851 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 650, loss[loss=0.09995, simple_loss=0.1293, pruned_loss=0.02532, audio_tagging_loss=0.009994, over 15606.00 frames. ], tot_loss[loss=0.0812, simple_loss=0.1018, pruned_loss=0.01994, audio_tagging_loss=0.01036, over 2934251.49 frames. ], batch size: 57, lr: 5.01e-03, grad_scale: 8.0 2023-11-20 10:17:45,497 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1046446.6666666666, ans=0.125 2023-11-20 10:17:57,364 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1046513.3333333334, ans=0.125 2023-11-20 10:18:03,457 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1046580.0, ans=0.125 2023-11-20 10:18:07,544 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=1046580.0, ans=10.0 2023-11-20 10:18:12,149 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1046580.0, ans=0.0 2023-11-20 10:18:14,520 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1046580.0, ans=0.0 2023-11-20 10:18:18,348 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 157000 2023-11-20 10:18:20,063 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1046646.6666666666, ans=0.0 2023-11-20 10:18:20,072 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1046646.6666666666, ans=0.1 2023-11-20 10:18:26,918 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1046646.6666666666, ans=0.2 2023-11-20 10:18:28,208 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1046646.6666666666, ans=0.125 2023-11-20 10:18:30,354 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 700, loss[loss=0.06148, simple_loss=0.07163, pruned_loss=0.0142, audio_tagging_loss=0.01147, over 14956.00 frames. ], tot_loss[loss=0.08107, simple_loss=0.1019, pruned_loss=0.01985, audio_tagging_loss=0.01025, over 2968606.17 frames. ], batch size: 58, lr: 5.01e-03, grad_scale: 8.0 2023-11-20 10:18:38,466 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.004e+01 8.472e+01 9.225e+01 1.029e+02 2.197e+02, threshold=1.845e+02, percent-clipped=1.0 2023-11-20 10:18:51,196 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1046780.0, ans=0.125 2023-11-20 10:18:54,935 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1046846.6666666666, ans=0.125 2023-11-20 10:19:01,623 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.79 vs. limit=22.5 2023-11-20 10:19:09,712 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1046913.3333333334, ans=0.1 2023-11-20 10:19:11,697 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.44 vs. limit=15.0 2023-11-20 10:19:23,875 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 157050 2023-11-20 10:19:26,989 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1046980.0, ans=0.2 2023-11-20 10:19:29,675 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1046980.0, ans=0.0 2023-11-20 10:19:35,557 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 750, loss[loss=0.09114, simple_loss=0.1095, pruned_loss=0.02568, audio_tagging_loss=0.01071, over 13744.00 frames. ], tot_loss[loss=0.0805, simple_loss=0.1011, pruned_loss=0.01962, audio_tagging_loss=0.01032, over 2989227.35 frames. ], batch size: 54, lr: 5.01e-03, grad_scale: 8.0 2023-11-20 10:19:39,643 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1047046.6666666666, ans=0.2 2023-11-20 10:19:42,379 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.60 vs. limit=15.0 2023-11-20 10:20:29,179 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 157100 2023-11-20 10:20:40,822 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 800, loss[loss=0.09397, simple_loss=0.1275, pruned_loss=0.02213, audio_tagging_loss=0.008104, over 15439.00 frames. ], tot_loss[loss=0.08003, simple_loss=0.1005, pruned_loss=0.01936, audio_tagging_loss=0.01044, over 3001741.87 frames. ], batch size: 55, lr: 5.01e-03, grad_scale: 16.0 2023-11-20 10:20:49,081 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.041e+01 8.237e+01 8.953e+01 9.687e+01 1.353e+02, threshold=1.791e+02, percent-clipped=0.0 2023-11-20 10:21:07,355 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.99 vs. limit=22.5 2023-11-20 10:21:20,654 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1047580.0, ans=0.125 2023-11-20 10:21:31,128 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1047580.0, ans=0.125 2023-11-20 10:21:34,729 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 157150 2023-11-20 10:21:46,961 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 850, loss[loss=0.05684, simple_loss=0.06783, pruned_loss=0.01342, audio_tagging_loss=0.009504, over 13781.00 frames. ], tot_loss[loss=0.0803, simple_loss=0.1011, pruned_loss=0.01944, audio_tagging_loss=0.0103, over 3012699.96 frames. ], batch size: 54, lr: 5.01e-03, grad_scale: 16.0 2023-11-20 10:22:08,799 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1047780.0, ans=0.125 2023-11-20 10:22:19,510 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.13 vs. limit=22.5 2023-11-20 10:22:23,993 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1047913.3333333334, ans=0.1 2023-11-20 10:22:25,932 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1047913.3333333334, ans=0.125 2023-11-20 10:22:27,084 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1047913.3333333334, ans=0.0 2023-11-20 10:22:39,795 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 157200 2023-11-20 10:22:51,808 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 900, loss[loss=0.06602, simple_loss=0.08461, pruned_loss=0.0129, audio_tagging_loss=0.01082, over 15197.00 frames. ], tot_loss[loss=0.08051, simple_loss=0.1012, pruned_loss=0.01958, audio_tagging_loss=0.01036, over 3025310.89 frames. ], batch size: 56, lr: 5.01e-03, grad_scale: 16.0 2023-11-20 10:22:52,197 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1048046.6666666666, ans=0.125 2023-11-20 10:22:59,174 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.565e+01 8.407e+01 9.404e+01 1.035e+02 1.329e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-20 10:23:44,573 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1048313.3333333334, ans=0.0 2023-11-20 10:23:45,681 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 157250 2023-11-20 10:23:46,466 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.14 vs. limit=15.0 2023-11-20 10:23:47,208 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1048313.3333333334, ans=0.0 2023-11-20 10:23:55,693 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1048380.0, ans=0.125 2023-11-20 10:23:57,282 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 950, loss[loss=0.09773, simple_loss=0.1354, pruned_loss=0.02492, audio_tagging_loss=0.0051, over 14739.00 frames. ], tot_loss[loss=0.08121, simple_loss=0.1026, pruned_loss=0.01979, audio_tagging_loss=0.01012, over 3040029.95 frames. ], batch size: 53, lr: 5.01e-03, grad_scale: 16.0 2023-11-20 10:24:00,162 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 10:24:03,661 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1048380.0, ans=0.125 2023-11-20 10:24:11,904 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 10:24:16,304 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1048446.6666666666, ans=0.0 2023-11-20 10:24:25,601 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1048513.3333333334, ans=0.1 2023-11-20 10:24:26,534 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1048513.3333333334, ans=0.125 2023-11-20 10:24:50,725 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 157300 2023-11-20 10:25:01,563 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1048713.3333333333, ans=0.0 2023-11-20 10:25:02,404 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 1000, loss[loss=0.07515, simple_loss=0.09492, pruned_loss=0.02019, audio_tagging_loss=0.007498, over 14675.00 frames. ], tot_loss[loss=0.08107, simple_loss=0.1023, pruned_loss=0.01994, audio_tagging_loss=0.009952, over 3042016.94 frames. ], batch size: 55, lr: 5.01e-03, grad_scale: 16.0 2023-11-20 10:25:10,463 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.129e+01 7.769e+01 8.550e+01 9.087e+01 1.228e+02, threshold=1.710e+02, percent-clipped=0.0 2023-11-20 10:25:12,626 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.97 vs. limit=15.0 2023-11-20 10:25:13,215 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1048713.3333333333, ans=0.125 2023-11-20 10:25:21,518 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1048780.0, ans=0.125 2023-11-20 10:25:29,938 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 10:25:39,814 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1048846.6666666667, ans=0.0 2023-11-20 10:25:43,363 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1048913.3333333333, ans=0.025 2023-11-20 10:25:54,157 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1048980.0, ans=0.125 2023-11-20 10:25:56,447 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 157350 2023-11-20 10:25:56,558 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1048980.0, ans=0.125 2023-11-20 10:26:00,755 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.71 vs. limit=15.0 2023-11-20 10:26:08,088 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 1050, loss[loss=0.06894, simple_loss=0.09195, pruned_loss=0.01435, audio_tagging_loss=0.008616, over 14767.00 frames. ], tot_loss[loss=0.08098, simple_loss=0.1023, pruned_loss=0.01993, audio_tagging_loss=0.009912, over 3042944.05 frames. ], batch size: 54, lr: 5.01e-03, grad_scale: 16.0 2023-11-20 10:26:08,873 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.70 vs. limit=22.5 2023-11-20 10:26:16,406 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.97 vs. limit=15.0 2023-11-20 10:26:30,583 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.37 vs. limit=15.0 2023-11-20 10:26:34,085 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1049180.0, ans=0.125 2023-11-20 10:26:54,883 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1049246.6666666667, ans=0.0 2023-11-20 10:27:01,913 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 157400 2023-11-20 10:27:02,206 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1049313.3333333333, ans=0.2 2023-11-20 10:27:13,302 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 1100, loss[loss=0.07104, simple_loss=0.09156, pruned_loss=0.01428, audio_tagging_loss=0.01098, over 15430.00 frames. ], tot_loss[loss=0.07964, simple_loss=0.1004, pruned_loss=0.01945, audio_tagging_loss=0.009983, over 3045472.26 frames. ], batch size: 59, lr: 5.01e-03, grad_scale: 16.0 2023-11-20 10:27:17,865 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 10:27:21,541 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.626e+01 8.064e+01 8.696e+01 9.715e+01 1.230e+02, threshold=1.739e+02, percent-clipped=0.0 2023-11-20 10:27:44,815 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1049513.3333333333, ans=0.2 2023-11-20 10:28:07,901 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 157450 2023-11-20 10:28:18,262 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1049713.3333333333, ans=0.2 2023-11-20 10:28:19,104 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 1150, loss[loss=0.09324, simple_loss=0.1218, pruned_loss=0.0219, audio_tagging_loss=0.01042, over 15050.00 frames. ], tot_loss[loss=0.07999, simple_loss=0.101, pruned_loss=0.01954, audio_tagging_loss=0.009933, over 3051685.32 frames. ], batch size: 55, lr: 5.00e-03, grad_scale: 16.0 2023-11-20 10:28:32,097 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1049780.0, ans=0.1 2023-11-20 10:28:36,804 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.84 vs. limit=6.0 2023-11-20 10:28:37,931 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1049780.0, ans=0.1 2023-11-20 10:28:41,658 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1049780.0, ans=0.125 2023-11-20 10:28:45,343 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1049846.6666666667, ans=0.125 2023-11-20 10:28:51,632 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1049846.6666666667, ans=0.125 2023-11-20 10:28:59,860 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1049913.3333333333, ans=0.0 2023-11-20 10:29:14,246 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 157500 2023-11-20 10:29:15,738 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1049980.0, ans=0.05 2023-11-20 10:29:26,537 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 1200, loss[loss=0.07829, simple_loss=0.1023, pruned_loss=0.01834, audio_tagging_loss=0.008784, over 15174.00 frames. ], tot_loss[loss=0.08013, simple_loss=0.101, pruned_loss=0.01973, audio_tagging_loss=0.009902, over 3045603.59 frames. ], batch size: 57, lr: 5.00e-03, grad_scale: 32.0 2023-11-20 10:29:33,303 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.37 vs. limit=15.0 2023-11-20 10:29:33,804 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.234e+01 8.296e+01 8.818e+01 9.703e+01 1.332e+02, threshold=1.764e+02, percent-clipped=0.0 2023-11-20 10:29:52,156 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.38 vs. limit=22.5 2023-11-20 10:29:53,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1050180.0, ans=0.1 2023-11-20 10:30:05,165 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1050246.6666666667, ans=0.2 2023-11-20 10:30:13,202 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1050246.6666666667, ans=0.2 2023-11-20 10:30:19,791 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 157550 2023-11-20 10:30:19,999 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1050313.3333333333, ans=0.125 2023-11-20 10:30:31,188 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 1250, loss[loss=0.06811, simple_loss=0.08934, pruned_loss=0.0131, audio_tagging_loss=0.01034, over 16844.00 frames. ], tot_loss[loss=0.07999, simple_loss=0.101, pruned_loss=0.01972, audio_tagging_loss=0.009752, over 3040605.57 frames. ], batch size: 63, lr: 5.00e-03, grad_scale: 32.0 2023-11-20 10:30:51,760 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1050446.6666666667, ans=0.04949747468305833 2023-11-20 10:31:01,201 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1050513.3333333333, ans=0.0 2023-11-20 10:31:24,028 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 157600 2023-11-20 10:31:31,067 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1050646.6666666667, ans=0.125 2023-11-20 10:31:32,469 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1050646.6666666667, ans=0.1 2023-11-20 10:31:35,691 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 1300, loss[loss=0.09692, simple_loss=0.1232, pruned_loss=0.02746, audio_tagging_loss=0.007869, over 14582.00 frames. ], tot_loss[loss=0.08038, simple_loss=0.1016, pruned_loss=0.01978, audio_tagging_loss=0.009788, over 3043033.78 frames. ], batch size: 56, lr: 5.00e-03, grad_scale: 32.0 2023-11-20 10:31:39,833 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1050713.3333333333, ans=0.1 2023-11-20 10:31:43,827 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.152e+01 8.044e+01 8.712e+01 9.271e+01 1.350e+02, threshold=1.742e+02, percent-clipped=0.0 2023-11-20 10:31:50,210 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1050780.0, ans=0.2 2023-11-20 10:31:51,392 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1050780.0, ans=0.125 2023-11-20 10:31:57,624 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1050780.0, ans=0.125 2023-11-20 10:31:57,680 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1050780.0, ans=0.1 2023-11-20 10:32:10,301 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1050846.6666666667, ans=0.2 2023-11-20 10:32:11,433 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1050846.6666666667, ans=0.0 2023-11-20 10:32:17,202 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1050913.3333333333, ans=0.125 2023-11-20 10:32:23,121 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1050913.3333333333, ans=0.1 2023-11-20 10:32:23,339 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1050913.3333333333, ans=0.125 2023-11-20 10:32:29,182 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 157650 2023-11-20 10:32:34,920 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1050980.0, ans=0.0 2023-11-20 10:32:40,746 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 1350, loss[loss=0.06623, simple_loss=0.0854, pruned_loss=0.01579, audio_tagging_loss=0.007736, over 14274.00 frames. ], tot_loss[loss=0.08046, simple_loss=0.1019, pruned_loss=0.01979, audio_tagging_loss=0.009717, over 3046305.91 frames. ], batch size: 53, lr: 5.00e-03, grad_scale: 16.0 2023-11-20 10:32:41,179 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1051046.6666666667, ans=0.0 2023-11-20 10:32:45,822 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1051046.6666666667, ans=0.0 2023-11-20 10:32:55,564 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1051113.3333333333, ans=0.015 2023-11-20 10:32:55,810 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1051113.3333333333, ans=0.0 2023-11-20 10:33:22,329 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.52 vs. limit=15.0 2023-11-20 10:33:28,899 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 10:33:32,228 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1051313.3333333333, ans=0.1 2023-11-20 10:33:34,534 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 157700 2023-11-20 10:33:41,967 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.50 vs. limit=15.0 2023-11-20 10:33:46,183 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 1400, loss[loss=0.06395, simple_loss=0.07908, pruned_loss=0.01284, audio_tagging_loss=0.01157, over 16061.00 frames. ], tot_loss[loss=0.08026, simple_loss=0.1015, pruned_loss=0.01971, audio_tagging_loss=0.009791, over 3048438.09 frames. ], batch size: 61, lr: 5.00e-03, grad_scale: 16.0 2023-11-20 10:33:55,354 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.246e+01 8.256e+01 8.950e+01 9.563e+01 1.336e+02, threshold=1.790e+02, percent-clipped=0.0 2023-11-20 10:34:00,601 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1051446.6666666667, ans=0.0 2023-11-20 10:34:12,319 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.39 vs. limit=22.5 2023-11-20 10:34:22,475 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1051513.3333333333, ans=0.0 2023-11-20 10:34:31,707 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1051580.0, ans=0.07 2023-11-20 10:34:38,692 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 157750 2023-11-20 10:34:38,987 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1051646.6666666667, ans=0.0 2023-11-20 10:34:39,485 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.80 vs. limit=10.0 2023-11-20 10:34:49,383 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1051713.3333333333, ans=0.2 2023-11-20 10:34:50,301 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 1450, loss[loss=0.08111, simple_loss=0.1028, pruned_loss=0.01962, audio_tagging_loss=0.0101, over 15048.00 frames. ], tot_loss[loss=0.07952, simple_loss=0.1004, pruned_loss=0.01941, audio_tagging_loss=0.009894, over 3047484.57 frames. ], batch size: 56, lr: 5.00e-03, grad_scale: 16.0 2023-11-20 10:34:50,673 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1051713.3333333333, ans=0.125 2023-11-20 10:35:36,406 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1051913.3333333333, ans=0.025 2023-11-20 10:35:43,428 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 157800 2023-11-20 10:35:47,852 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1051980.0, ans=0.2 2023-11-20 10:35:55,558 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 1500, loss[loss=0.08241, simple_loss=0.1056, pruned_loss=0.01858, audio_tagging_loss=0.01102, over 14587.00 frames. ], tot_loss[loss=0.0797, simple_loss=0.1001, pruned_loss=0.01959, audio_tagging_loss=0.01005, over 3039863.48 frames. ], batch size: 53, lr: 5.00e-03, grad_scale: 16.0 2023-11-20 10:36:03,305 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.39 vs. limit=6.0 2023-11-20 10:36:04,965 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.891e+01 8.201e+01 8.884e+01 9.746e+01 1.400e+02, threshold=1.777e+02, percent-clipped=0.0 2023-11-20 10:36:05,324 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1052046.6666666667, ans=0.125 2023-11-20 10:36:05,447 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1052046.6666666667, ans=0.0 2023-11-20 10:36:12,403 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1052113.3333333333, ans=0.125 2023-11-20 10:36:17,397 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1052113.3333333333, ans=0.0 2023-11-20 10:36:18,745 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1052113.3333333333, ans=0.1 2023-11-20 10:36:42,559 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.97 vs. limit=15.0 2023-11-20 10:36:50,150 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 157850 2023-11-20 10:36:50,219 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1052313.3333333333, ans=0.0 2023-11-20 10:36:59,019 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1052313.3333333333, ans=0.125 2023-11-20 10:37:01,340 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 1550, loss[loss=0.09348, simple_loss=0.1125, pruned_loss=0.02752, audio_tagging_loss=0.009723, over 15480.00 frames. ], tot_loss[loss=0.08006, simple_loss=0.1004, pruned_loss=0.01965, audio_tagging_loss=0.0102, over 3035242.82 frames. ], batch size: 56, lr: 5.00e-03, grad_scale: 16.0 2023-11-20 10:37:30,662 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1052513.3333333333, ans=0.1 2023-11-20 10:37:47,352 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1052580.0, ans=0.035 2023-11-20 10:37:55,494 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 157900 2023-11-20 10:37:56,982 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1052646.6666666667, ans=0.125 2023-11-20 10:38:07,092 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 1600, loss[loss=0.08438, simple_loss=0.1016, pruned_loss=0.02199, audio_tagging_loss=0.01157, over 14312.00 frames. ], tot_loss[loss=0.08003, simple_loss=0.1002, pruned_loss=0.01965, audio_tagging_loss=0.01026, over 3040042.25 frames. ], batch size: 56, lr: 5.00e-03, grad_scale: 32.0 2023-11-20 10:38:15,677 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.476e+01 8.170e+01 8.883e+01 9.558e+01 1.180e+02, threshold=1.777e+02, percent-clipped=0.0 2023-11-20 10:38:26,581 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1052780.0, ans=0.125 2023-11-20 10:38:38,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1052846.6666666667, ans=0.0 2023-11-20 10:38:46,348 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1052913.3333333333, ans=0.125 2023-11-20 10:38:51,996 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1052913.3333333333, ans=0.125 2023-11-20 10:38:53,377 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1052913.3333333333, ans=0.0 2023-11-20 10:39:00,475 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 157950 2023-11-20 10:39:12,306 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 1650, loss[loss=0.08206, simple_loss=0.09968, pruned_loss=0.02296, audio_tagging_loss=0.009255, over 14855.00 frames. ], tot_loss[loss=0.08001, simple_loss=0.1003, pruned_loss=0.01953, audio_tagging_loss=0.01033, over 3047595.21 frames. ], batch size: 55, lr: 5.00e-03, grad_scale: 32.0 2023-11-20 10:39:49,996 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.93 vs. limit=15.0 2023-11-20 10:40:06,101 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 158000 2023-11-20 10:40:06,299 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1053313.3333333333, ans=0.0 2023-11-20 10:40:13,284 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.29 vs. limit=10.0 2023-11-20 10:40:17,625 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 1700, loss[loss=0.0749, simple_loss=0.09362, pruned_loss=0.01918, audio_tagging_loss=0.008914, over 16559.00 frames. ], tot_loss[loss=0.08019, simple_loss=0.1003, pruned_loss=0.01973, audio_tagging_loss=0.01029, over 3048646.38 frames. ], batch size: 63, lr: 5.00e-03, grad_scale: 32.0 2023-11-20 10:40:26,719 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.487e+01 8.158e+01 8.615e+01 9.243e+01 1.140e+02, threshold=1.723e+02, percent-clipped=0.0 2023-11-20 10:40:28,335 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1053380.0, ans=0.1 2023-11-20 10:40:54,000 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1053513.3333333333, ans=0.125 2023-11-20 10:41:04,497 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1053580.0, ans=0.125 2023-11-20 10:41:10,269 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 158050 2023-11-20 10:41:10,953 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.03 vs. limit=15.0 2023-11-20 10:41:18,343 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=1053646.6666666667, ans=10.0 2023-11-20 10:41:22,448 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 1750, loss[loss=0.1077, simple_loss=0.144, pruned_loss=0.02697, audio_tagging_loss=0.008677, over 16592.00 frames. ], tot_loss[loss=0.0804, simple_loss=0.1011, pruned_loss=0.01974, audio_tagging_loss=0.01013, over 3052160.69 frames. ], batch size: 58, lr: 5.00e-03, grad_scale: 32.0 2023-11-20 10:41:40,849 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1053780.0, ans=0.125 2023-11-20 10:42:07,239 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.53 vs. limit=15.0 2023-11-20 10:42:15,709 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 158100 2023-11-20 10:42:17,231 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1053980.0, ans=0.1 2023-11-20 10:42:24,823 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1053980.0, ans=0.2 2023-11-20 10:42:27,190 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 1800, loss[loss=0.05531, simple_loss=0.07157, pruned_loss=0.01007, audio_tagging_loss=0.009453, over 15165.00 frames. ], tot_loss[loss=0.08062, simple_loss=0.1014, pruned_loss=0.01994, audio_tagging_loss=0.009989, over 3049425.22 frames. ], batch size: 58, lr: 4.99e-03, grad_scale: 16.0 2023-11-20 10:42:37,567 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.937e+01 8.074e+01 8.907e+01 9.490e+01 1.284e+02, threshold=1.781e+02, percent-clipped=0.0 2023-11-20 10:43:20,189 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 158150 2023-11-20 10:43:29,472 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1054313.3333333333, ans=0.125 2023-11-20 10:43:31,677 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 1850, loss[loss=0.118, simple_loss=0.1646, pruned_loss=0.03149, audio_tagging_loss=0.004228, over 16250.00 frames. ], tot_loss[loss=0.08069, simple_loss=0.1016, pruned_loss=0.01993, audio_tagging_loss=0.009941, over 3051610.44 frames. ], batch size: 59, lr: 4.99e-03, grad_scale: 16.0 2023-11-20 10:43:43,159 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1054446.6666666667, ans=0.125 2023-11-20 10:43:43,179 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1054446.6666666667, ans=0.1 2023-11-20 10:43:49,691 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.27 vs. limit=15.0 2023-11-20 10:43:55,267 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1054446.6666666667, ans=0.1 2023-11-20 10:44:19,133 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1054580.0, ans=0.125 2023-11-20 10:44:25,171 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 158200 2023-11-20 10:44:37,037 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 1900, loss[loss=0.08058, simple_loss=0.1016, pruned_loss=0.0185, audio_tagging_loss=0.01128, over 14932.00 frames. ], tot_loss[loss=0.08041, simple_loss=0.1014, pruned_loss=0.0198, audio_tagging_loss=0.009893, over 3057753.18 frames. ], batch size: 55, lr: 4.99e-03, grad_scale: 16.0 2023-11-20 10:44:47,497 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.693e+01 8.196e+01 8.901e+01 9.660e+01 1.214e+02, threshold=1.780e+02, percent-clipped=0.0 2023-11-20 10:45:13,892 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1054846.6666666667, ans=0.125 2023-11-20 10:45:22,913 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.47 vs. limit=22.5 2023-11-20 10:45:30,915 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 158250 2023-11-20 10:45:42,691 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 1950, loss[loss=0.07451, simple_loss=0.09868, pruned_loss=0.01544, audio_tagging_loss=0.009727, over 14873.00 frames. ], tot_loss[loss=0.08036, simple_loss=0.1016, pruned_loss=0.01971, audio_tagging_loss=0.009831, over 3065668.98 frames. ], batch size: 56, lr: 4.99e-03, grad_scale: 16.0 2023-11-20 10:45:44,438 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1055046.6666666667, ans=0.125 2023-11-20 10:45:51,563 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1055046.6666666667, ans=0.125 2023-11-20 10:46:02,648 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1055113.3333333333, ans=0.0 2023-11-20 10:46:13,721 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1055180.0, ans=0.125 2023-11-20 10:46:29,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1055246.6666666667, ans=0.0 2023-11-20 10:46:35,240 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 10:46:36,405 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 158300 2023-11-20 10:46:47,903 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 2000, loss[loss=0.1081, simple_loss=0.1249, pruned_loss=0.03172, audio_tagging_loss=0.01394, over 16248.00 frames. ], tot_loss[loss=0.08023, simple_loss=0.1013, pruned_loss=0.01971, audio_tagging_loss=0.009845, over 3060941.20 frames. ], batch size: 63, lr: 4.99e-03, grad_scale: 32.0 2023-11-20 10:46:49,506 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1055380.0, ans=0.125 2023-11-20 10:46:50,482 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1055380.0, ans=0.125 2023-11-20 10:46:57,727 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.488e+01 8.029e+01 8.442e+01 9.197e+01 1.092e+02, threshold=1.688e+02, percent-clipped=0.0 2023-11-20 10:47:41,053 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 158350 2023-11-20 10:47:52,041 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 2050, loss[loss=0.1192, simple_loss=0.1513, pruned_loss=0.03553, audio_tagging_loss=0.008065, over 16277.00 frames. ], tot_loss[loss=0.08061, simple_loss=0.1016, pruned_loss=0.01994, audio_tagging_loss=0.009875, over 3057436.96 frames. ], batch size: 57, lr: 4.99e-03, grad_scale: 32.0 2023-11-20 10:48:09,772 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1055780.0, ans=0.2 2023-11-20 10:48:45,963 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 158400 2023-11-20 10:48:50,445 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1055980.0, ans=0.0 2023-11-20 10:48:58,041 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 2100, loss[loss=0.09044, simple_loss=0.116, pruned_loss=0.02081, audio_tagging_loss=0.01164, over 16128.00 frames. ], tot_loss[loss=0.07969, simple_loss=0.1002, pruned_loss=0.01963, audio_tagging_loss=0.009968, over 3057043.49 frames. ], batch size: 59, lr: 4.99e-03, grad_scale: 32.0 2023-11-20 10:49:08,129 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.69 vs. limit=10.0 2023-11-20 10:49:08,621 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.102e+01 8.360e+01 8.882e+01 9.714e+01 1.219e+02, threshold=1.776e+02, percent-clipped=0.0 2023-11-20 10:49:44,105 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.08 vs. limit=22.5 2023-11-20 10:49:47,815 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1056246.6666666667, ans=0.125 2023-11-20 10:49:51,967 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 158450 2023-11-20 10:49:53,232 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1056313.3333333333, ans=0.0 2023-11-20 10:49:56,238 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.04 vs. limit=15.0 2023-11-20 10:50:02,849 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 2150, loss[loss=0.06869, simple_loss=0.09881, pruned_loss=0.01374, audio_tagging_loss=0.005553, over 15448.00 frames. ], tot_loss[loss=0.08014, simple_loss=0.1011, pruned_loss=0.01974, audio_tagging_loss=0.009858, over 3057173.76 frames. ], batch size: 58, lr: 4.99e-03, grad_scale: 32.0 2023-11-20 10:50:16,085 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1056446.6666666667, ans=0.1 2023-11-20 10:50:28,018 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1056513.3333333333, ans=0.125 2023-11-20 10:50:38,401 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1056513.3333333333, ans=0.0 2023-11-20 10:50:42,493 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 10:50:54,265 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1056646.6666666667, ans=0.125 2023-11-20 10:50:56,678 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 158500 2023-11-20 10:50:58,205 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1056646.6666666667, ans=0.07 2023-11-20 10:50:58,705 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.57 vs. limit=22.5 2023-11-20 10:51:07,683 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 2200, loss[loss=0.06835, simple_loss=0.0802, pruned_loss=0.01714, audio_tagging_loss=0.01111, over 14950.00 frames. ], tot_loss[loss=0.08088, simple_loss=0.1023, pruned_loss=0.02001, audio_tagging_loss=0.009719, over 3056043.56 frames. ], batch size: 57, lr: 4.99e-03, grad_scale: 32.0 2023-11-20 10:51:19,261 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.870e+01 8.263e+01 8.832e+01 9.449e+01 1.423e+02, threshold=1.766e+02, percent-clipped=0.0 2023-11-20 10:51:25,043 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.79 vs. limit=6.0 2023-11-20 10:52:00,275 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 158550 2023-11-20 10:52:12,217 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 2250, loss[loss=0.06967, simple_loss=0.09088, pruned_loss=0.01716, audio_tagging_loss=0.007071, over 15573.00 frames. ], tot_loss[loss=0.08044, simple_loss=0.1015, pruned_loss=0.01987, audio_tagging_loss=0.009824, over 3054215.70 frames. ], batch size: 57, lr: 4.99e-03, grad_scale: 8.0 2023-11-20 10:52:14,838 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1057046.6666666667, ans=0.125 2023-11-20 10:52:16,264 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1057046.6666666667, ans=0.1 2023-11-20 10:52:17,958 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1057046.6666666667, ans=0.1 2023-11-20 10:52:49,474 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1057180.0, ans=0.2 2023-11-20 10:53:00,055 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 10:53:01,081 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1057246.6666666667, ans=0.0 2023-11-20 10:53:05,852 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 158600 2023-11-20 10:53:16,701 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.09 vs. limit=15.0 2023-11-20 10:53:18,560 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 2300, loss[loss=0.05694, simple_loss=0.07005, pruned_loss=0.01027, audio_tagging_loss=0.01164, over 14607.00 frames. ], tot_loss[loss=0.07958, simple_loss=0.1001, pruned_loss=0.01958, audio_tagging_loss=0.009941, over 3047911.48 frames. ], batch size: 57, lr: 4.99e-03, grad_scale: 8.0 2023-11-20 10:53:29,990 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.86 vs. limit=22.5 2023-11-20 10:53:31,782 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.567e+01 8.157e+01 8.586e+01 9.219e+01 1.150e+02, threshold=1.717e+02, percent-clipped=0.0 2023-11-20 10:53:32,243 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1057446.6666666667, ans=0.0 2023-11-20 10:53:39,024 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.86 vs. limit=15.0 2023-11-20 10:53:44,339 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1057513.3333333333, ans=0.04949747468305833 2023-11-20 10:54:13,239 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 158650 2023-11-20 10:54:16,848 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 10:54:24,143 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 2350, loss[loss=0.04966, simple_loss=0.06289, pruned_loss=0.00945, audio_tagging_loss=0.008772, over 14393.00 frames. ], tot_loss[loss=0.07951, simple_loss=0.09982, pruned_loss=0.01963, audio_tagging_loss=0.00997, over 3046795.08 frames. ], batch size: 56, lr: 4.99e-03, grad_scale: 8.0 2023-11-20 10:54:31,947 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1057713.3333333333, ans=0.09899494936611666 2023-11-20 10:54:43,691 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1057780.0, ans=0.125 2023-11-20 10:54:45,559 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.65 vs. limit=22.5 2023-11-20 10:54:50,764 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1057846.6666666667, ans=0.0 2023-11-20 10:55:05,891 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1057913.3333333333, ans=0.0 2023-11-20 10:55:07,087 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1057913.3333333333, ans=0.125 2023-11-20 10:55:17,746 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.54 vs. limit=6.0 2023-11-20 10:55:18,113 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 158700 2023-11-20 10:55:18,256 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1057980.0, ans=0.125 2023-11-20 10:55:29,617 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 2400, loss[loss=0.08192, simple_loss=0.1049, pruned_loss=0.02097, audio_tagging_loss=0.008507, over 15495.00 frames. ], tot_loss[loss=0.07983, simple_loss=0.1001, pruned_loss=0.01977, audio_tagging_loss=0.01003, over 3046943.16 frames. ], batch size: 57, lr: 4.99e-03, grad_scale: 16.0 2023-11-20 10:55:42,650 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.833e+01 8.027e+01 8.593e+01 9.391e+01 2.644e+02, threshold=1.719e+02, percent-clipped=1.0 2023-11-20 10:55:49,190 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.23 vs. limit=12.0 2023-11-20 10:55:49,876 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1058113.3333333333, ans=0.0 2023-11-20 10:55:54,641 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.32 vs. limit=22.5 2023-11-20 10:56:14,664 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1058246.6666666667, ans=0.1 2023-11-20 10:56:14,872 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.41 vs. limit=15.0 2023-11-20 10:56:19,748 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1058246.6666666667, ans=0.125 2023-11-20 10:56:22,740 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn1.whiten.whitening_limit, batch_count=1058313.3333333333, ans=22.5 2023-11-20 10:56:23,299 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 158750 2023-11-20 10:56:23,574 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1058313.3333333333, ans=0.0 2023-11-20 10:56:26,314 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.94 vs. limit=15.0 2023-11-20 10:56:32,866 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1058313.3333333333, ans=10.0 2023-11-20 10:56:35,174 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 2450, loss[loss=0.07175, simple_loss=0.08925, pruned_loss=0.01688, audio_tagging_loss=0.01024, over 17003.00 frames. ], tot_loss[loss=0.07921, simple_loss=0.09906, pruned_loss=0.01942, audio_tagging_loss=0.01026, over 3049912.91 frames. ], batch size: 65, lr: 4.98e-03, grad_scale: 16.0 2023-11-20 10:56:47,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1058446.6666666667, ans=0.05 2023-11-20 10:56:49,537 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.62 vs. limit=15.0 2023-11-20 10:57:22,669 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1058580.0, ans=0.0 2023-11-20 10:57:26,363 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1058646.6666666667, ans=0.2 2023-11-20 10:57:29,302 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 158800 2023-11-20 10:57:36,886 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.51 vs. limit=22.5 2023-11-20 10:57:41,058 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 2500, loss[loss=0.1046, simple_loss=0.13, pruned_loss=0.02851, audio_tagging_loss=0.01113, over 16286.00 frames. ], tot_loss[loss=0.08009, simple_loss=0.1002, pruned_loss=0.01979, audio_tagging_loss=0.01022, over 3054067.71 frames. ], batch size: 57, lr: 4.98e-03, grad_scale: 16.0 2023-11-20 10:57:53,999 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.038e+01 8.094e+01 8.721e+01 9.744e+01 1.305e+02, threshold=1.744e+02, percent-clipped=0.0 2023-11-20 10:57:56,101 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.12 vs. limit=15.0 2023-11-20 10:58:34,493 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 158850 2023-11-20 10:58:46,279 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 2550, loss[loss=0.07974, simple_loss=0.1001, pruned_loss=0.02161, audio_tagging_loss=0.008085, over 14722.00 frames. ], tot_loss[loss=0.08068, simple_loss=0.1008, pruned_loss=0.01999, audio_tagging_loss=0.01027, over 3048962.93 frames. ], batch size: 57, lr: 4.98e-03, grad_scale: 16.0 2023-11-20 10:59:00,654 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1059113.3333333333, ans=0.0 2023-11-20 10:59:17,318 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.83 vs. limit=15.0 2023-11-20 10:59:17,485 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.46 vs. limit=6.0 2023-11-20 10:59:18,383 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1059180.0, ans=0.125 2023-11-20 10:59:22,514 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=1059180.0, ans=6.0 2023-11-20 10:59:38,098 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.32 vs. limit=15.0 2023-11-20 10:59:38,918 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1059313.3333333333, ans=0.1 2023-11-20 10:59:39,876 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 158900 2023-11-20 10:59:41,326 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1059313.3333333333, ans=0.2 2023-11-20 10:59:46,694 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1059313.3333333333, ans=0.2 2023-11-20 10:59:51,253 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 2600, loss[loss=0.08848, simple_loss=0.1132, pruned_loss=0.01865, audio_tagging_loss=0.01325, over 14680.00 frames. ], tot_loss[loss=0.07949, simple_loss=0.09956, pruned_loss=0.01962, audio_tagging_loss=0.01009, over 3045732.87 frames. ], batch size: 56, lr: 4.98e-03, grad_scale: 16.0 2023-11-20 11:00:04,928 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.842e+01 8.545e+01 8.985e+01 9.606e+01 1.560e+02, threshold=1.797e+02, percent-clipped=0.0 2023-11-20 11:00:19,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1059513.3333333333, ans=0.035 2023-11-20 11:00:22,394 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1059513.3333333333, ans=0.125 2023-11-20 11:00:43,830 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1059646.6666666667, ans=0.125 2023-11-20 11:00:44,954 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 158950 2023-11-20 11:00:49,427 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1059646.6666666667, ans=0.1 2023-11-20 11:00:55,095 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1059646.6666666667, ans=0.125 2023-11-20 11:00:57,147 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 2650, loss[loss=0.09282, simple_loss=0.1312, pruned_loss=0.0203, audio_tagging_loss=0.006919, over 15733.00 frames. ], tot_loss[loss=0.07938, simple_loss=0.09945, pruned_loss=0.01965, audio_tagging_loss=0.009998, over 3045283.19 frames. ], batch size: 57, lr: 4.98e-03, grad_scale: 16.0 2023-11-20 11:01:01,137 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1059713.3333333333, ans=0.0 2023-11-20 11:01:04,784 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1059713.3333333333, ans=0.05 2023-11-20 11:01:13,538 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.26 vs. limit=10.0 2023-11-20 11:01:20,467 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1059780.0, ans=0.0 2023-11-20 11:01:21,788 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1059846.6666666667, ans=0.0 2023-11-20 11:01:40,655 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.80 vs. limit=22.5 2023-11-20 11:01:50,669 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 159000 2023-11-20 11:01:58,706 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1059980.0, ans=0.1 2023-11-20 11:02:02,099 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.47 vs. limit=15.0 2023-11-20 11:02:02,632 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 2700, loss[loss=0.07065, simple_loss=0.08931, pruned_loss=0.01667, audio_tagging_loss=0.009328, over 15321.00 frames. ], tot_loss[loss=0.0791, simple_loss=0.09919, pruned_loss=0.01955, audio_tagging_loss=0.009951, over 3045105.29 frames. ], batch size: 56, lr: 4.98e-03, grad_scale: 16.0 2023-11-20 11:02:04,175 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1060046.6666666667, ans=0.125 2023-11-20 11:02:10,573 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1060046.6666666667, ans=0.0 2023-11-20 11:02:15,844 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.786e+01 7.991e+01 8.664e+01 9.430e+01 1.129e+02, threshold=1.733e+02, percent-clipped=0.0 2023-11-20 11:02:21,440 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.14 vs. limit=15.0 2023-11-20 11:02:23,749 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 11:02:29,363 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=1060180.0, ans=0.5 2023-11-20 11:02:36,254 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.52 vs. limit=15.0 2023-11-20 11:02:38,922 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1060180.0, ans=0.2 2023-11-20 11:02:56,835 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 159050 2023-11-20 11:03:08,518 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 2750, loss[loss=0.07784, simple_loss=0.1056, pruned_loss=0.01627, audio_tagging_loss=0.008784, over 14921.00 frames. ], tot_loss[loss=0.07949, simple_loss=0.09969, pruned_loss=0.01971, audio_tagging_loss=0.009931, over 3048013.14 frames. ], batch size: 53, lr: 4.98e-03, grad_scale: 16.0 2023-11-20 11:03:16,367 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1060380.0, ans=0.125 2023-11-20 11:03:32,599 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1060446.6666666667, ans=0.125 2023-11-20 11:03:35,100 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1060513.3333333333, ans=0.0 2023-11-20 11:03:49,547 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.73 vs. limit=22.5 2023-11-20 11:03:53,271 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.18 vs. limit=22.5 2023-11-20 11:04:02,006 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 159100 2023-11-20 11:04:04,411 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 11:04:11,688 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1060646.6666666667, ans=0.125 2023-11-20 11:04:13,744 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 2800, loss[loss=0.07944, simple_loss=0.1022, pruned_loss=0.01999, audio_tagging_loss=0.008346, over 15688.00 frames. ], tot_loss[loss=0.07994, simple_loss=0.1006, pruned_loss=0.0198, audio_tagging_loss=0.009824, over 3049754.23 frames. ], batch size: 60, lr: 4.98e-03, grad_scale: 32.0 2023-11-20 11:04:22,072 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1060713.3333333333, ans=0.025 2023-11-20 11:04:26,644 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.546e+01 8.183e+01 8.895e+01 9.590e+01 1.282e+02, threshold=1.779e+02, percent-clipped=0.0 2023-11-20 11:04:29,678 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.81 vs. limit=15.0 2023-11-20 11:04:53,130 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1060913.3333333333, ans=0.07 2023-11-20 11:04:55,428 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1060913.3333333333, ans=0.0 2023-11-20 11:04:58,010 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.49 vs. limit=10.0 2023-11-20 11:05:07,160 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 159150 2023-11-20 11:05:15,307 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 11:05:15,833 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.38 vs. limit=22.5 2023-11-20 11:05:18,765 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 2850, loss[loss=0.0678, simple_loss=0.08576, pruned_loss=0.01714, audio_tagging_loss=0.007777, over 15032.00 frames. ], tot_loss[loss=0.07923, simple_loss=0.09967, pruned_loss=0.01957, audio_tagging_loss=0.009827, over 3041387.36 frames. ], batch size: 57, lr: 4.98e-03, grad_scale: 32.0 2023-11-20 11:05:28,322 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=1061046.6666666667, ans=0.05 2023-11-20 11:05:33,651 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1061113.3333333333, ans=0.125 2023-11-20 11:05:43,523 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1061180.0, ans=0.2 2023-11-20 11:06:05,650 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1061246.6666666667, ans=0.125 2023-11-20 11:06:12,335 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 159200 2023-11-20 11:06:12,458 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1061313.3333333333, ans=0.125 2023-11-20 11:06:24,458 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 2900, loss[loss=0.09445, simple_loss=0.1153, pruned_loss=0.02696, audio_tagging_loss=0.009824, over 13794.00 frames. ], tot_loss[loss=0.07939, simple_loss=0.09994, pruned_loss=0.01958, audio_tagging_loss=0.009832, over 3045784.26 frames. ], batch size: 53, lr: 4.98e-03, grad_scale: 32.0 2023-11-20 11:06:37,436 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.449e+01 8.069e+01 8.700e+01 9.440e+01 1.245e+02, threshold=1.740e+02, percent-clipped=0.0 2023-11-20 11:06:44,155 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1061446.6666666667, ans=0.125 2023-11-20 11:07:04,850 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1061580.0, ans=0.125 2023-11-20 11:07:10,180 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1061580.0, ans=0.2 2023-11-20 11:07:13,230 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1061580.0, ans=0.125 2023-11-20 11:07:18,177 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 159250 2023-11-20 11:07:18,297 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1061646.6666666667, ans=0.125 2023-11-20 11:07:29,979 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 2950, loss[loss=0.07471, simple_loss=0.088, pruned_loss=0.01797, audio_tagging_loss=0.01274, over 15184.00 frames. ], tot_loss[loss=0.07919, simple_loss=0.09964, pruned_loss=0.0195, audio_tagging_loss=0.009868, over 3042052.98 frames. ], batch size: 57, lr: 4.98e-03, grad_scale: 32.0 2023-11-20 11:07:30,788 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.40 vs. limit=15.0 2023-11-20 11:07:58,797 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1061846.6666666667, ans=0.1 2023-11-20 11:08:14,308 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1061913.3333333333, ans=0.07 2023-11-20 11:08:22,443 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1061980.0, ans=0.125 2023-11-20 11:08:23,525 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 159300 2023-11-20 11:08:28,972 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.62 vs. limit=15.0 2023-11-20 11:08:34,628 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 3000, loss[loss=0.09047, simple_loss=0.1186, pruned_loss=0.0213, audio_tagging_loss=0.009878, over 14807.00 frames. ], tot_loss[loss=0.07936, simple_loss=0.09983, pruned_loss=0.01952, audio_tagging_loss=0.009926, over 3039869.06 frames. ], batch size: 53, lr: 4.98e-03, grad_scale: 32.0 2023-11-20 11:08:34,632 INFO [train_asr.py:1285] (0/4) Computing validation loss 2023-11-20 11:09:14,364 INFO [train_asr.py:1294] (0/4) Epoch 14, validation: loss=0.06185, simple_loss=0.05368, pruned_loss=0.005702, audio_tagging_loss=0.02931, over 4681554.00 frames. 2023-11-20 11:09:14,365 INFO [train_asr.py:1295] (0/4) Maximum memory allocated so far is 25925MB 2023-11-20 11:09:24,232 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.89 vs. limit=12.0 2023-11-20 11:09:27,218 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.003e+01 8.131e+01 8.854e+01 9.762e+01 1.260e+02, threshold=1.771e+02, percent-clipped=0.0 2023-11-20 11:09:28,867 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1062113.3333333333, ans=0.125 2023-11-20 11:09:33,501 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff3.min_abs, batch_count=1062113.3333333333, ans=0.2 2023-11-20 11:09:37,067 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1062113.3333333333, ans=0.125 2023-11-20 11:09:49,729 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1062180.0, ans=0.0 2023-11-20 11:10:05,106 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1062313.3333333333, ans=0.0 2023-11-20 11:10:07,393 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 159350 2023-11-20 11:10:08,883 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1062313.3333333333, ans=0.125 2023-11-20 11:10:09,206 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.10 vs. limit=22.5 2023-11-20 11:10:19,221 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 3050, loss[loss=0.07369, simple_loss=0.09047, pruned_loss=0.01839, audio_tagging_loss=0.01007, over 16485.00 frames. ], tot_loss[loss=0.07957, simple_loss=0.1001, pruned_loss=0.01963, audio_tagging_loss=0.009911, over 3044650.09 frames. ], batch size: 62, lr: 4.97e-03, grad_scale: 32.0 2023-11-20 11:10:27,208 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.03 vs. limit=15.0 2023-11-20 11:10:29,660 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.00 vs. limit=15.0 2023-11-20 11:10:56,752 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 11:10:56,968 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1062580.0, ans=0.2 2023-11-20 11:11:12,504 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 159400 2023-11-20 11:11:24,004 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 3100, loss[loss=0.07149, simple_loss=0.08163, pruned_loss=0.0184, audio_tagging_loss=0.01228, over 14440.00 frames. ], tot_loss[loss=0.07972, simple_loss=0.1005, pruned_loss=0.01951, audio_tagging_loss=0.009967, over 3041734.21 frames. ], batch size: 54, lr: 4.97e-03, grad_scale: 32.0 2023-11-20 11:11:29,250 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=1062713.3333333333, ans=10.0 2023-11-20 11:11:37,072 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.82 vs. limit=15.0 2023-11-20 11:11:37,687 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.763e+01 7.933e+01 8.636e+01 9.301e+01 1.154e+02, threshold=1.727e+02, percent-clipped=0.0 2023-11-20 11:11:46,147 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1062780.0, ans=0.0 2023-11-20 11:12:04,568 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.88 vs. limit=12.0 2023-11-20 11:12:18,310 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 159450 2023-11-20 11:12:29,852 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 3150, loss[loss=0.07667, simple_loss=0.09457, pruned_loss=0.01908, audio_tagging_loss=0.0103, over 15194.00 frames. ], tot_loss[loss=0.08009, simple_loss=0.1008, pruned_loss=0.01963, audio_tagging_loss=0.01005, over 3046636.33 frames. ], batch size: 56, lr: 4.97e-03, grad_scale: 32.0 2023-11-20 11:12:36,031 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.45 vs. limit=22.5 2023-11-20 11:12:58,823 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1063180.0, ans=0.0 2023-11-20 11:13:24,109 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 159500 2023-11-20 11:13:35,131 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.83 vs. limit=15.0 2023-11-20 11:13:35,645 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 3200, loss[loss=0.1009, simple_loss=0.1346, pruned_loss=0.02464, audio_tagging_loss=0.008906, over 15915.00 frames. ], tot_loss[loss=0.08067, simple_loss=0.1016, pruned_loss=0.01977, audio_tagging_loss=0.01012, over 3044712.10 frames. ], batch size: 54, lr: 4.97e-03, grad_scale: 32.0 2023-11-20 11:13:39,582 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1063380.0, ans=0.1 2023-11-20 11:13:47,902 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.818e+01 8.344e+01 9.152e+01 9.986e+01 1.362e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-20 11:13:48,218 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1063446.6666666667, ans=0.025 2023-11-20 11:13:53,359 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1063446.6666666667, ans=0.2 2023-11-20 11:14:12,677 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1063513.3333333333, ans=0.07 2023-11-20 11:14:15,345 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1063580.0, ans=0.125 2023-11-20 11:14:29,404 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 159550 2023-11-20 11:14:31,910 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1063646.6666666667, ans=0.125 2023-11-20 11:14:40,182 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 3250, loss[loss=0.09636, simple_loss=0.1166, pruned_loss=0.02507, audio_tagging_loss=0.01299, over 15611.00 frames. ], tot_loss[loss=0.08017, simple_loss=0.1008, pruned_loss=0.01952, audio_tagging_loss=0.01023, over 3052955.53 frames. ], batch size: 58, lr: 4.97e-03, grad_scale: 32.0 2023-11-20 11:14:43,069 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1063713.3333333333, ans=0.0 2023-11-20 11:15:02,481 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1063780.0, ans=0.0 2023-11-20 11:15:06,118 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1063846.6666666667, ans=0.125 2023-11-20 11:15:29,985 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1063913.3333333333, ans=0.125 2023-11-20 11:15:34,363 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 159600 2023-11-20 11:15:38,651 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1063980.0, ans=0.125 2023-11-20 11:15:41,128 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1063980.0, ans=0.125 2023-11-20 11:15:45,714 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 3300, loss[loss=0.07928, simple_loss=0.1055, pruned_loss=0.01971, audio_tagging_loss=0.00682, over 14596.00 frames. ], tot_loss[loss=0.07971, simple_loss=0.1003, pruned_loss=0.01928, audio_tagging_loss=0.01027, over 3051610.97 frames. ], batch size: 54, lr: 4.97e-03, grad_scale: 32.0 2023-11-20 11:15:54,278 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1064046.6666666667, ans=0.0 2023-11-20 11:15:55,896 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.10 vs. limit=15.0 2023-11-20 11:15:58,826 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.658e+01 7.969e+01 8.807e+01 9.518e+01 1.189e+02, threshold=1.761e+02, percent-clipped=0.0 2023-11-20 11:15:59,191 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1064113.3333333333, ans=0.0 2023-11-20 11:16:18,490 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1064180.0, ans=0.125 2023-11-20 11:16:21,337 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.50 vs. limit=22.5 2023-11-20 11:16:29,239 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1064246.6666666667, ans=0.125 2023-11-20 11:16:40,262 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 159650 2023-11-20 11:16:45,147 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1064313.3333333333, ans=0.1 2023-11-20 11:16:51,797 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 3350, loss[loss=0.08243, simple_loss=0.1009, pruned_loss=0.02192, audio_tagging_loss=0.01005, over 15067.00 frames. ], tot_loss[loss=0.08012, simple_loss=0.1008, pruned_loss=0.01954, audio_tagging_loss=0.01016, over 3049546.09 frames. ], batch size: 57, lr: 4.97e-03, grad_scale: 32.0 2023-11-20 11:16:58,264 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 11:16:58,306 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1064380.0, ans=0.125 2023-11-20 11:17:32,765 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=1064580.0, ans=15.0 2023-11-20 11:17:45,141 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 159700 2023-11-20 11:17:45,233 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1064646.6666666667, ans=0.2 2023-11-20 11:17:45,310 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1064646.6666666667, ans=0.125 2023-11-20 11:17:48,265 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.78 vs. limit=15.0 2023-11-20 11:17:50,190 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1064646.6666666667, ans=0.125 2023-11-20 11:17:51,988 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.30 vs. limit=6.0 2023-11-20 11:17:56,231 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 3400, loss[loss=0.06187, simple_loss=0.07909, pruned_loss=0.01384, audio_tagging_loss=0.008477, over 14819.00 frames. ], tot_loss[loss=0.07918, simple_loss=0.09969, pruned_loss=0.0193, audio_tagging_loss=0.01003, over 3050879.00 frames. ], batch size: 55, lr: 4.97e-03, grad_scale: 16.0 2023-11-20 11:18:10,597 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.407e+01 8.209e+01 8.921e+01 9.607e+01 2.745e+02, threshold=1.784e+02, percent-clipped=1.0 2023-11-20 11:18:12,206 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1064780.0, ans=0.125 2023-11-20 11:18:17,476 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.27 vs. limit=15.0 2023-11-20 11:18:32,410 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1064846.6666666667, ans=0.0 2023-11-20 11:18:49,735 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 159750 2023-11-20 11:18:57,023 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.36 vs. limit=22.5 2023-11-20 11:19:01,462 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 3450, loss[loss=0.07297, simple_loss=0.08879, pruned_loss=0.01462, audio_tagging_loss=0.01396, over 14169.00 frames. ], tot_loss[loss=0.07902, simple_loss=0.09983, pruned_loss=0.01921, audio_tagging_loss=0.009902, over 3048040.28 frames. ], batch size: 54, lr: 4.97e-03, grad_scale: 8.0 2023-11-20 11:19:12,187 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1065046.6666666667, ans=0.125 2023-11-20 11:19:54,758 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 159800 2023-11-20 11:20:07,005 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 3500, loss[loss=0.08404, simple_loss=0.1161, pruned_loss=0.01885, audio_tagging_loss=0.007131, over 15168.00 frames. ], tot_loss[loss=0.07936, simple_loss=0.1002, pruned_loss=0.01943, audio_tagging_loss=0.009828, over 3043985.61 frames. ], batch size: 56, lr: 4.97e-03, grad_scale: 8.0 2023-11-20 11:20:13,047 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1065380.0, ans=0.125 2023-11-20 11:20:14,628 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.09 vs. limit=15.0 2023-11-20 11:20:22,598 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.721e+01 8.437e+01 9.163e+01 1.016e+02 1.154e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-20 11:20:29,351 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.26 vs. limit=22.5 2023-11-20 11:20:30,877 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1065446.6666666667, ans=0.0 2023-11-20 11:20:40,565 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 11:20:51,279 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 11:20:53,180 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.30 vs. limit=15.0 2023-11-20 11:21:00,623 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 159850 2023-11-20 11:21:04,462 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1065646.6666666667, ans=0.125 2023-11-20 11:21:11,685 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 3550, loss[loss=0.06809, simple_loss=0.08165, pruned_loss=0.01855, audio_tagging_loss=0.008719, over 13999.00 frames. ], tot_loss[loss=0.07933, simple_loss=0.1001, pruned_loss=0.01945, audio_tagging_loss=0.009829, over 3044241.30 frames. ], batch size: 53, lr: 4.97e-03, grad_scale: 8.0 2023-11-20 11:21:13,721 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.77 vs. limit=6.0 2023-11-20 11:21:17,121 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1065713.3333333333, ans=0.125 2023-11-20 11:21:18,324 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=4.665e-02 2023-11-20 11:21:23,588 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1065780.0, ans=0.0 2023-11-20 11:21:27,281 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1065780.0, ans=0.0 2023-11-20 11:21:44,140 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1065846.6666666667, ans=0.2 2023-11-20 11:22:04,638 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 159900 2023-11-20 11:22:06,208 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1065980.0, ans=0.125 2023-11-20 11:22:16,472 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 3600, loss[loss=0.08069, simple_loss=0.09767, pruned_loss=0.02086, audio_tagging_loss=0.011, over 15102.00 frames. ], tot_loss[loss=0.07928, simple_loss=0.09974, pruned_loss=0.01949, audio_tagging_loss=0.00992, over 3042513.37 frames. ], batch size: 56, lr: 4.97e-03, grad_scale: 16.0 2023-11-20 11:22:28,567 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1066113.3333333333, ans=0.125 2023-11-20 11:22:31,765 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.755e+01 8.370e+01 9.104e+01 9.925e+01 1.510e+02, threshold=1.821e+02, percent-clipped=0.0 2023-11-20 11:22:33,287 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_na.min_abs, batch_count=1066113.3333333333, ans=0.02 2023-11-20 11:22:48,916 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.70 vs. limit=22.5 2023-11-20 11:22:50,212 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.47 vs. limit=10.0 2023-11-20 11:23:09,957 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 159950 2023-11-20 11:23:15,706 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.86 vs. limit=10.0 2023-11-20 11:23:21,979 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 3650, loss[loss=0.07707, simple_loss=0.1053, pruned_loss=0.01437, audio_tagging_loss=0.01003, over 14865.00 frames. ], tot_loss[loss=0.08029, simple_loss=0.1011, pruned_loss=0.01986, audio_tagging_loss=0.00988, over 3047516.39 frames. ], batch size: 57, lr: 4.97e-03, grad_scale: 16.0 2023-11-20 11:23:28,222 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1066380.0, ans=0.0 2023-11-20 11:23:33,785 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1066380.0, ans=0.0 2023-11-20 11:23:38,742 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1066446.6666666667, ans=0.125 2023-11-20 11:23:41,111 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1066446.6666666667, ans=0.0 2023-11-20 11:23:56,957 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1066513.3333333333, ans=0.1 2023-11-20 11:24:09,944 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1066580.0, ans=0.125 2023-11-20 11:24:13,800 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1066646.6666666667, ans=0.025 2023-11-20 11:24:15,448 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1066646.6666666667, ans=0.125 2023-11-20 11:24:16,562 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 160000 2023-11-20 11:24:18,102 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-160000.pt 2023-11-20 11:24:29,935 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.64 vs. limit=12.0 2023-11-20 11:24:31,858 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 3700, loss[loss=0.08961, simple_loss=0.1112, pruned_loss=0.02747, audio_tagging_loss=0.00653, over 14758.00 frames. ], tot_loss[loss=0.08037, simple_loss=0.1011, pruned_loss=0.01999, audio_tagging_loss=0.009807, over 3044569.64 frames. ], batch size: 56, lr: 4.96e-03, grad_scale: 16.0 2023-11-20 11:24:35,731 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1066713.3333333333, ans=0.125 2023-11-20 11:24:47,122 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.803e+01 8.291e+01 8.874e+01 9.793e+01 1.503e+02, threshold=1.775e+02, percent-clipped=0.0 2023-11-20 11:25:05,888 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.42 vs. limit=22.5 2023-11-20 11:25:08,863 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1066846.6666666667, ans=0.125 2023-11-20 11:25:25,340 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 160050 2023-11-20 11:25:27,104 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.22 vs. limit=15.0 2023-11-20 11:25:36,919 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 3750, loss[loss=0.07032, simple_loss=0.09193, pruned_loss=0.01581, audio_tagging_loss=0.008539, over 15041.00 frames. ], tot_loss[loss=0.07976, simple_loss=0.1003, pruned_loss=0.01969, audio_tagging_loss=0.009915, over 3047194.74 frames. ], batch size: 57, lr: 4.96e-03, grad_scale: 16.0 2023-11-20 11:25:41,508 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.31 vs. limit=15.0 2023-11-20 11:25:56,697 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.22 vs. limit=15.0 2023-11-20 11:26:22,728 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 11:26:30,173 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 160100 2023-11-20 11:26:41,898 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 3800, loss[loss=0.07941, simple_loss=0.1059, pruned_loss=0.01662, audio_tagging_loss=0.009857, over 15947.00 frames. ], tot_loss[loss=0.07966, simple_loss=0.1003, pruned_loss=0.01956, audio_tagging_loss=0.009954, over 3053415.34 frames. ], batch size: 59, lr: 4.96e-03, grad_scale: 16.0 2023-11-20 11:26:55,650 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1067446.6666666667, ans=0.2 2023-11-20 11:26:55,965 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.53 vs. limit=12.0 2023-11-20 11:26:57,808 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.898e+01 8.245e+01 9.019e+01 9.669e+01 1.480e+02, threshold=1.804e+02, percent-clipped=0.0 2023-11-20 11:27:03,051 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1067446.6666666667, ans=0.0 2023-11-20 11:27:29,597 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1067580.0, ans=0.0 2023-11-20 11:27:36,397 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 160150 2023-11-20 11:27:39,279 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1067646.6666666667, ans=0.0 2023-11-20 11:27:48,121 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 3850, loss[loss=0.08338, simple_loss=0.1124, pruned_loss=0.0206, audio_tagging_loss=0.006602, over 15767.00 frames. ], tot_loss[loss=0.07928, simple_loss=0.09967, pruned_loss=0.01945, audio_tagging_loss=0.009995, over 3058603.46 frames. ], batch size: 58, lr: 4.96e-03, grad_scale: 16.0 2023-11-20 11:27:50,889 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1067713.3333333333, ans=0.2 2023-11-20 11:28:16,423 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.33 vs. limit=15.0 2023-11-20 11:28:31,206 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1067913.3333333333, ans=0.125 2023-11-20 11:28:32,843 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.88 vs. limit=15.0 2023-11-20 11:28:41,419 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 160200 2023-11-20 11:28:53,408 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 3900, loss[loss=0.0756, simple_loss=0.09769, pruned_loss=0.01474, audio_tagging_loss=0.01202, over 14825.00 frames. ], tot_loss[loss=0.07956, simple_loss=0.1001, pruned_loss=0.01946, audio_tagging_loss=0.01006, over 3048718.99 frames. ], batch size: 57, lr: 4.96e-03, grad_scale: 16.0 2023-11-20 11:28:58,563 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1068046.6666666667, ans=0.125 2023-11-20 11:29:03,478 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1068046.6666666667, ans=0.0 2023-11-20 11:29:08,872 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.848e+01 8.101e+01 8.668e+01 9.712e+01 1.300e+02, threshold=1.734e+02, percent-clipped=0.0 2023-11-20 11:29:19,637 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1068180.0, ans=0.2 2023-11-20 11:29:24,582 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1068180.0, ans=0.125 2023-11-20 11:29:44,643 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1068313.3333333333, ans=10.0 2023-11-20 11:29:46,806 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 160250 2023-11-20 11:29:50,737 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1068313.3333333333, ans=0.0 2023-11-20 11:29:52,181 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1068313.3333333333, ans=0.125 2023-11-20 11:29:52,249 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1068313.3333333333, ans=0.125 2023-11-20 11:29:57,087 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1068380.0, ans=0.2 2023-11-20 11:29:58,758 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 3950, loss[loss=0.06526, simple_loss=0.07258, pruned_loss=0.01533, audio_tagging_loss=0.01365, over 14987.00 frames. ], tot_loss[loss=0.07914, simple_loss=0.09917, pruned_loss=0.01934, audio_tagging_loss=0.01022, over 3051682.67 frames. ], batch size: 55, lr: 4.96e-03, grad_scale: 16.0 2023-11-20 11:30:11,251 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=1068446.6666666667, ans=15.0 2023-11-20 11:30:52,483 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 160300 2023-11-20 11:30:53,938 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1068646.6666666667, ans=0.2 2023-11-20 11:30:57,431 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.35 vs. limit=6.0 2023-11-20 11:31:04,113 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 4000, loss[loss=0.098, simple_loss=0.128, pruned_loss=0.02363, audio_tagging_loss=0.01039, over 16189.00 frames. ], tot_loss[loss=0.08011, simple_loss=0.1006, pruned_loss=0.01962, audio_tagging_loss=0.01019, over 3045050.91 frames. ], batch size: 61, lr: 4.96e-03, grad_scale: 32.0 2023-11-20 11:31:06,117 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.49 vs. limit=12.0 2023-11-20 11:31:16,593 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.01 vs. limit=15.0 2023-11-20 11:31:18,642 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1068780.0, ans=0.0 2023-11-20 11:31:20,247 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.967e+01 8.155e+01 8.816e+01 9.659e+01 1.219e+02, threshold=1.763e+02, percent-clipped=0.0 2023-11-20 11:31:29,733 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=1068846.6666666667, ans=15.0 2023-11-20 11:31:57,400 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 160350 2023-11-20 11:32:08,200 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1069046.6666666667, ans=0.125 2023-11-20 11:32:09,814 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 4050, loss[loss=0.09301, simple_loss=0.1235, pruned_loss=0.0208, audio_tagging_loss=0.01047, over 14976.00 frames. ], tot_loss[loss=0.07973, simple_loss=0.09988, pruned_loss=0.01948, audio_tagging_loss=0.01031, over 3045921.18 frames. ], batch size: 54, lr: 4.96e-03, grad_scale: 16.0 2023-11-20 11:32:13,604 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 11:32:15,196 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1069046.6666666667, ans=0.125 2023-11-20 11:32:29,467 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1069113.3333333333, ans=0.1 2023-11-20 11:32:33,101 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1069113.3333333333, ans=0.125 2023-11-20 11:32:33,182 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=1069113.3333333333, ans=0.02 2023-11-20 11:32:40,914 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1069180.0, ans=0.2 2023-11-20 11:32:54,347 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.79 vs. limit=22.5 2023-11-20 11:32:59,298 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1069246.6666666667, ans=0.0 2023-11-20 11:33:02,901 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 160400 2023-11-20 11:33:04,732 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 11:33:09,632 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1069313.3333333333, ans=0.125 2023-11-20 11:33:14,104 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 4100, loss[loss=0.07522, simple_loss=0.09186, pruned_loss=0.01815, audio_tagging_loss=0.01115, over 15420.00 frames. ], tot_loss[loss=0.07941, simple_loss=0.0997, pruned_loss=0.01931, audio_tagging_loss=0.01025, over 3038304.40 frames. ], batch size: 58, lr: 4.96e-03, grad_scale: 16.0 2023-11-20 11:33:15,801 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.03 vs. limit=6.0 2023-11-20 11:33:31,211 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.500e+01 8.137e+01 8.868e+01 9.537e+01 1.552e+02, threshold=1.774e+02, percent-clipped=0.0 2023-11-20 11:33:41,380 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1069513.3333333333, ans=0.04949747468305833 2023-11-20 11:33:43,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1069513.3333333333, ans=0.125 2023-11-20 11:33:50,641 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1069513.3333333333, ans=0.2 2023-11-20 11:34:04,227 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1069646.6666666667, ans=0.0 2023-11-20 11:34:07,301 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 160450 2023-11-20 11:34:07,487 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 11:34:11,423 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.68 vs. limit=15.0 2023-11-20 11:34:19,034 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 4150, loss[loss=0.06547, simple_loss=0.08629, pruned_loss=0.01408, audio_tagging_loss=0.008246, over 15854.00 frames. ], tot_loss[loss=0.08006, simple_loss=0.1007, pruned_loss=0.01963, audio_tagging_loss=0.01007, over 3037954.92 frames. ], batch size: 59, lr: 4.96e-03, grad_scale: 16.0 2023-11-20 11:34:33,814 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1069780.0, ans=0.125 2023-11-20 11:34:33,863 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1069780.0, ans=0.125 2023-11-20 11:34:44,522 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1069846.6666666667, ans=0.125 2023-11-20 11:34:48,232 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1069846.6666666667, ans=0.1 2023-11-20 11:35:04,541 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.16 vs. limit=15.0 2023-11-20 11:35:06,446 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 11:35:11,520 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 160500 2023-11-20 11:35:22,607 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 4200, loss[loss=0.09011, simple_loss=0.1065, pruned_loss=0.02605, audio_tagging_loss=0.01079, over 14907.00 frames. ], tot_loss[loss=0.08041, simple_loss=0.1013, pruned_loss=0.01979, audio_tagging_loss=0.00996, over 3039665.09 frames. ], batch size: 56, lr: 4.96e-03, grad_scale: 16.0 2023-11-20 11:35:23,394 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=1070046.6666666667, ans=22.5 2023-11-20 11:35:30,488 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1070046.6666666667, ans=0.1 2023-11-20 11:35:40,169 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.812e+01 8.053e+01 8.867e+01 9.480e+01 1.332e+02, threshold=1.773e+02, percent-clipped=0.0 2023-11-20 11:35:52,835 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1070180.0, ans=0.1 2023-11-20 11:35:56,945 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.99 vs. limit=12.0 2023-11-20 11:36:03,321 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.05 vs. limit=22.5 2023-11-20 11:36:05,241 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1070246.6666666667, ans=0.125 2023-11-20 11:36:08,884 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1070246.6666666667, ans=0.0 2023-11-20 11:36:08,961 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1070246.6666666667, ans=0.125 2023-11-20 11:36:16,925 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 160550 2023-11-20 11:36:20,403 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.28 vs. limit=15.0 2023-11-20 11:36:25,237 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1070313.3333333333, ans=0.0 2023-11-20 11:36:28,453 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 4250, loss[loss=0.08469, simple_loss=0.1147, pruned_loss=0.02182, audio_tagging_loss=0.005524, over 15185.00 frames. ], tot_loss[loss=0.08066, simple_loss=0.1019, pruned_loss=0.01987, audio_tagging_loss=0.00983, over 3043647.85 frames. ], batch size: 54, lr: 4.96e-03, grad_scale: 16.0 2023-11-20 11:36:32,966 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1070380.0, ans=0.1 2023-11-20 11:36:57,371 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 11:37:22,885 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 160600 2023-11-20 11:37:34,836 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 4300, loss[loss=0.06626, simple_loss=0.07756, pruned_loss=0.01614, audio_tagging_loss=0.01133, over 14650.00 frames. ], tot_loss[loss=0.08068, simple_loss=0.1022, pruned_loss=0.01986, audio_tagging_loss=0.009719, over 3041919.66 frames. ], batch size: 55, lr: 4.96e-03, grad_scale: 16.0 2023-11-20 11:37:39,895 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1070713.3333333333, ans=0.125 2023-11-20 11:37:43,940 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.51 vs. limit=22.5 2023-11-20 11:37:48,577 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1070780.0, ans=0.125 2023-11-20 11:37:49,616 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1070780.0, ans=0.2 2023-11-20 11:37:50,507 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.072e+01 8.383e+01 9.324e+01 1.005e+02 1.336e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-20 11:38:03,253 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1070846.6666666667, ans=0.125 2023-11-20 11:38:17,046 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.72 vs. limit=22.5 2023-11-20 11:38:27,611 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.90 vs. limit=15.0 2023-11-20 11:38:28,424 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 160650 2023-11-20 11:38:39,426 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 4350, loss[loss=0.06786, simple_loss=0.08483, pruned_loss=0.01627, audio_tagging_loss=0.009173, over 14466.00 frames. ], tot_loss[loss=0.08089, simple_loss=0.1024, pruned_loss=0.01997, audio_tagging_loss=0.009721, over 3046460.04 frames. ], batch size: 55, lr: 4.95e-03, grad_scale: 16.0 2023-11-20 11:38:42,158 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1071046.6666666667, ans=0.125 2023-11-20 11:38:55,675 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1071113.3333333333, ans=0.0 2023-11-20 11:39:03,922 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1071113.3333333333, ans=0.125 2023-11-20 11:39:05,023 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1071180.0, ans=0.1 2023-11-20 11:39:07,502 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1071180.0, ans=0.125 2023-11-20 11:39:32,562 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 160700 2023-11-20 11:39:44,639 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.04 vs. limit=15.0 2023-11-20 11:39:45,087 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 4400, loss[loss=0.0769, simple_loss=0.09792, pruned_loss=0.01774, audio_tagging_loss=0.0102, over 15063.00 frames. ], tot_loss[loss=0.08064, simple_loss=0.1019, pruned_loss=0.01988, audio_tagging_loss=0.009788, over 3045834.81 frames. ], batch size: 59, lr: 4.95e-03, grad_scale: 32.0 2023-11-20 11:40:02,271 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.072e+01 8.159e+01 8.593e+01 9.338e+01 1.252e+02, threshold=1.719e+02, percent-clipped=0.0 2023-11-20 11:40:22,587 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1071580.0, ans=0.125 2023-11-20 11:40:24,361 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1071580.0, ans=0.1 2023-11-20 11:40:32,036 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1071580.0, ans=0.125 2023-11-20 11:40:36,745 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.64 vs. limit=15.0 2023-11-20 11:40:38,687 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 160750 2023-11-20 11:40:50,810 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 4450, loss[loss=0.08289, simple_loss=0.1016, pruned_loss=0.01902, audio_tagging_loss=0.01306, over 15301.00 frames. ], tot_loss[loss=0.08079, simple_loss=0.1022, pruned_loss=0.01996, audio_tagging_loss=0.009746, over 3051875.82 frames. ], batch size: 58, lr: 4.95e-03, grad_scale: 32.0 2023-11-20 11:41:13,992 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1071780.0, ans=0.2 2023-11-20 11:41:41,063 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=1071913.3333333333, ans=15.0 2023-11-20 11:41:44,404 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 160800 2023-11-20 11:41:46,343 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1071980.0, ans=0.2 2023-11-20 11:41:56,228 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 4500, loss[loss=0.07993, simple_loss=0.09977, pruned_loss=0.01954, audio_tagging_loss=0.0105, over 15720.00 frames. ], tot_loss[loss=0.08125, simple_loss=0.1025, pruned_loss=0.02014, audio_tagging_loss=0.009844, over 3052457.22 frames. ], batch size: 59, lr: 4.95e-03, grad_scale: 32.0 2023-11-20 11:42:12,853 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.114e+01 8.604e+01 9.143e+01 9.908e+01 1.250e+02, threshold=1.829e+02, percent-clipped=0.0 2023-11-20 11:42:17,940 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.37 vs. limit=15.0 2023-11-20 11:42:44,078 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1072246.6666666667, ans=0.2 2023-11-20 11:42:50,257 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 160850 2023-11-20 11:42:56,030 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1072313.3333333333, ans=0.1 2023-11-20 11:43:01,113 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1072380.0, ans=0.0 2023-11-20 11:43:01,956 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 4550, loss[loss=0.08569, simple_loss=0.1048, pruned_loss=0.02339, audio_tagging_loss=0.009915, over 15225.00 frames. ], tot_loss[loss=0.08057, simple_loss=0.1015, pruned_loss=0.02005, audio_tagging_loss=0.009791, over 3044375.94 frames. ], batch size: 56, lr: 4.95e-03, grad_scale: 32.0 2023-11-20 11:43:19,926 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.49 vs. limit=22.5 2023-11-20 11:43:28,283 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.49 vs. limit=15.0 2023-11-20 11:43:35,398 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1072513.3333333333, ans=0.125 2023-11-20 11:43:45,898 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1072580.0, ans=0.0 2023-11-20 11:43:48,264 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1072580.0, ans=0.1 2023-11-20 11:43:48,275 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1072580.0, ans=0.2 2023-11-20 11:43:53,076 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 11:43:55,646 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 160900 2023-11-20 11:44:03,530 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1072646.6666666667, ans=0.0 2023-11-20 11:44:08,148 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 4600, loss[loss=0.07451, simple_loss=0.08946, pruned_loss=0.01638, audio_tagging_loss=0.01339, over 14495.00 frames. ], tot_loss[loss=0.08037, simple_loss=0.101, pruned_loss=0.0199, audio_tagging_loss=0.009972, over 3037114.27 frames. ], batch size: 54, lr: 4.95e-03, grad_scale: 32.0 2023-11-20 11:44:24,868 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.602e+01 8.263e+01 8.597e+01 9.248e+01 1.165e+02, threshold=1.719e+02, percent-clipped=0.0 2023-11-20 11:44:57,954 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.86 vs. limit=10.0 2023-11-20 11:45:01,653 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 160950 2023-11-20 11:45:03,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1072980.0, ans=0.1 2023-11-20 11:45:09,020 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1072980.0, ans=0.125 2023-11-20 11:45:12,491 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 4650, loss[loss=0.07701, simple_loss=0.08466, pruned_loss=0.02139, audio_tagging_loss=0.01328, over 14200.00 frames. ], tot_loss[loss=0.07962, simple_loss=0.09963, pruned_loss=0.01969, audio_tagging_loss=0.01011, over 3030656.51 frames. ], batch size: 57, lr: 4.95e-03, grad_scale: 32.0 2023-11-20 11:45:18,953 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1073046.6666666667, ans=10.0 2023-11-20 11:45:26,758 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1073113.3333333333, ans=0.125 2023-11-20 11:45:28,012 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1073113.3333333333, ans=0.0 2023-11-20 11:46:05,356 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 161000 2023-11-20 11:46:08,256 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1073313.3333333333, ans=0.125 2023-11-20 11:46:17,319 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 4700, loss[loss=0.09546, simple_loss=0.118, pruned_loss=0.02857, audio_tagging_loss=0.007915, over 15390.00 frames. ], tot_loss[loss=0.07962, simple_loss=0.09962, pruned_loss=0.01961, audio_tagging_loss=0.0102, over 3035009.44 frames. ], batch size: 56, lr: 4.95e-03, grad_scale: 32.0 2023-11-20 11:46:27,519 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1073380.0, ans=0.2 2023-11-20 11:46:34,228 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.63 vs. limit=15.0 2023-11-20 11:46:34,689 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.595e+01 8.445e+01 9.172e+01 1.004e+02 1.426e+02, threshold=1.834e+02, percent-clipped=0.0 2023-11-20 11:46:51,617 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1073513.3333333333, ans=0.0 2023-11-20 11:46:56,454 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1073580.0, ans=0.0 2023-11-20 11:47:03,099 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1073580.0, ans=0.2 2023-11-20 11:47:04,431 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1073580.0, ans=0.125 2023-11-20 11:47:05,926 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1073580.0, ans=0.125 2023-11-20 11:47:10,717 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 161050 2023-11-20 11:47:22,427 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 4750, loss[loss=0.0707, simple_loss=0.08614, pruned_loss=0.01783, audio_tagging_loss=0.009799, over 14975.00 frames. ], tot_loss[loss=0.07911, simple_loss=0.09895, pruned_loss=0.01933, audio_tagging_loss=0.0103, over 3038449.94 frames. ], batch size: 57, lr: 4.95e-03, grad_scale: 16.0 2023-11-20 11:47:23,070 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.77 vs. limit=15.0 2023-11-20 11:47:42,692 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.61 vs. limit=22.5 2023-11-20 11:47:45,833 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1073780.0, ans=0.125 2023-11-20 11:47:45,872 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1073780.0, ans=0.1 2023-11-20 11:47:51,108 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=14.53 vs. limit=15.0 2023-11-20 11:47:58,887 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1073846.6666666667, ans=0.0 2023-11-20 11:48:15,305 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 161100 2023-11-20 11:48:27,012 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 4800, loss[loss=0.07386, simple_loss=0.09599, pruned_loss=0.01392, audio_tagging_loss=0.01195, over 14713.00 frames. ], tot_loss[loss=0.07802, simple_loss=0.09739, pruned_loss=0.0189, audio_tagging_loss=0.01042, over 3035533.07 frames. ], batch size: 55, lr: 4.95e-03, grad_scale: 16.0 2023-11-20 11:48:29,063 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.59 vs. limit=10.0 2023-11-20 11:48:45,982 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.153e+01 8.077e+01 9.114e+01 9.869e+01 1.463e+02, threshold=1.823e+02, percent-clipped=0.0 2023-11-20 11:48:46,456 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1074113.3333333333, ans=0.125 2023-11-20 11:49:11,461 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1074246.6666666667, ans=0.125 2023-11-20 11:49:19,965 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 161150 2023-11-20 11:49:31,588 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 4850, loss[loss=0.0556, simple_loss=0.06345, pruned_loss=0.01132, audio_tagging_loss=0.01255, over 16694.00 frames. ], tot_loss[loss=0.07798, simple_loss=0.09722, pruned_loss=0.01885, audio_tagging_loss=0.01052, over 3044136.13 frames. ], batch size: 64, lr: 4.95e-03, grad_scale: 16.0 2023-11-20 11:49:34,185 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1074380.0, ans=0.1 2023-11-20 11:49:36,760 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1074380.0, ans=0.125 2023-11-20 11:49:40,452 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1074380.0, ans=0.0 2023-11-20 11:50:20,403 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1074580.0, ans=0.2 2023-11-20 11:50:25,137 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 161200 2023-11-20 11:50:26,843 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1074646.6666666667, ans=0.0 2023-11-20 11:50:36,472 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 4900, loss[loss=0.07521, simple_loss=0.09825, pruned_loss=0.01517, audio_tagging_loss=0.01091, over 15518.00 frames. ], tot_loss[loss=0.07901, simple_loss=0.09892, pruned_loss=0.0192, audio_tagging_loss=0.01035, over 3050144.22 frames. ], batch size: 60, lr: 4.95e-03, grad_scale: 16.0 2023-11-20 11:50:36,722 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1074713.3333333333, ans=0.125 2023-11-20 11:50:48,034 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1074713.3333333333, ans=0.125 2023-11-20 11:50:56,979 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.282e+01 8.231e+01 8.987e+01 9.531e+01 1.955e+02, threshold=1.797e+02, percent-clipped=1.0 2023-11-20 11:50:57,371 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1074780.0, ans=0.2 2023-11-20 11:51:03,321 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1074846.6666666667, ans=10.0 2023-11-20 11:51:19,737 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.48 vs. limit=5.0 2023-11-20 11:51:30,875 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 161250 2023-11-20 11:51:43,152 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 4950, loss[loss=0.06853, simple_loss=0.09175, pruned_loss=0.01358, audio_tagging_loss=0.009073, over 14998.00 frames. ], tot_loss[loss=0.07899, simple_loss=0.09918, pruned_loss=0.01922, audio_tagging_loss=0.01018, over 3047064.06 frames. ], batch size: 55, lr: 4.95e-03, grad_scale: 16.0 2023-11-20 11:51:48,733 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.30 vs. limit=15.0 2023-11-20 11:51:55,908 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1075113.3333333333, ans=0.125 2023-11-20 11:52:03,866 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1075113.3333333333, ans=0.125 2023-11-20 11:52:30,920 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1075246.6666666667, ans=0.125 2023-11-20 11:52:36,390 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 161300 2023-11-20 11:52:47,456 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 5000, loss[loss=0.08604, simple_loss=0.111, pruned_loss=0.02387, audio_tagging_loss=0.006683, over 15767.00 frames. ], tot_loss[loss=0.0791, simple_loss=0.09965, pruned_loss=0.01928, audio_tagging_loss=0.009992, over 3045196.05 frames. ], batch size: 57, lr: 4.94e-03, grad_scale: 16.0 2023-11-20 11:52:52,000 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1075380.0, ans=0.0 2023-11-20 11:53:07,298 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.854e+01 7.908e+01 8.687e+01 9.464e+01 1.260e+02, threshold=1.737e+02, percent-clipped=0.0 2023-11-20 11:53:13,965 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1075513.3333333333, ans=0.07 2023-11-20 11:53:20,671 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=1075513.3333333333, ans=0.025 2023-11-20 11:53:31,010 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1075580.0, ans=0.125 2023-11-20 11:53:41,364 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 161350 2023-11-20 11:53:49,110 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1075646.6666666667, ans=0.2 2023-11-20 11:53:49,184 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.36 vs. limit=15.0 2023-11-20 11:53:50,268 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1075646.6666666667, ans=0.0 2023-11-20 11:53:52,385 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 5050, loss[loss=0.07251, simple_loss=0.096, pruned_loss=0.01326, audio_tagging_loss=0.01126, over 15890.00 frames. ], tot_loss[loss=0.07921, simple_loss=0.1002, pruned_loss=0.01932, audio_tagging_loss=0.009781, over 3048936.14 frames. ], batch size: 60, lr: 4.94e-03, grad_scale: 16.0 2023-11-20 11:53:59,323 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1075713.3333333333, ans=0.1 2023-11-20 11:54:11,321 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1075780.0, ans=0.07 2023-11-20 11:54:26,828 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1075846.6666666667, ans=0.125 2023-11-20 11:54:46,753 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 161400 2023-11-20 11:54:55,489 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1075980.0, ans=0.2 2023-11-20 11:54:57,272 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.57 vs. limit=15.0 2023-11-20 11:54:58,787 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 5100, loss[loss=0.07274, simple_loss=0.0828, pruned_loss=0.01972, audio_tagging_loss=0.01162, over 14782.00 frames. ], tot_loss[loss=0.0786, simple_loss=0.09912, pruned_loss=0.01913, audio_tagging_loss=0.009908, over 3052420.35 frames. ], batch size: 55, lr: 4.94e-03, grad_scale: 16.0 2023-11-20 11:55:09,627 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=1076046.6666666667, ans=10.0 2023-11-20 11:55:16,948 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1076113.3333333333, ans=0.125 2023-11-20 11:55:17,834 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.894e+01 8.531e+01 9.239e+01 1.028e+02 1.398e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-20 11:55:18,156 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 11:55:21,224 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1076113.3333333333, ans=10.0 2023-11-20 11:55:48,734 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1076246.6666666667, ans=0.2 2023-11-20 11:55:51,340 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1076313.3333333333, ans=0.125 2023-11-20 11:55:52,382 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 161450 2023-11-20 11:55:58,256 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1076313.3333333333, ans=0.1 2023-11-20 11:56:04,131 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 5150, loss[loss=0.07006, simple_loss=0.09053, pruned_loss=0.0145, audio_tagging_loss=0.0103, over 15982.00 frames. ], tot_loss[loss=0.07908, simple_loss=0.09957, pruned_loss=0.01938, audio_tagging_loss=0.009913, over 3048950.52 frames. ], batch size: 62, lr: 4.94e-03, grad_scale: 16.0 2023-11-20 11:56:26,983 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.20 vs. limit=15.0 2023-11-20 11:56:30,880 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.94 vs. limit=15.0 2023-11-20 11:56:38,432 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1076513.3333333333, ans=0.125 2023-11-20 11:56:57,396 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 161500 2023-11-20 11:57:05,514 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1076646.6666666667, ans=0.125 2023-11-20 11:57:05,646 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1076646.6666666667, ans=0.04949747468305833 2023-11-20 11:57:07,971 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1076713.3333333333, ans=0.125 2023-11-20 11:57:09,012 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 5200, loss[loss=0.0668, simple_loss=0.08052, pruned_loss=0.01569, audio_tagging_loss=0.01084, over 16302.00 frames. ], tot_loss[loss=0.07873, simple_loss=0.09932, pruned_loss=0.01913, audio_tagging_loss=0.009935, over 3048532.88 frames. ], batch size: 65, lr: 4.94e-03, grad_scale: 32.0 2023-11-20 11:57:09,302 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1076713.3333333333, ans=0.125 2023-11-20 11:57:19,605 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1076713.3333333333, ans=0.125 2023-11-20 11:57:28,475 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.211e+01 8.173e+01 8.712e+01 9.541e+01 1.479e+02, threshold=1.742e+02, percent-clipped=0.0 2023-11-20 11:57:40,357 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1076846.6666666667, ans=0.0 2023-11-20 11:58:01,843 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 161550 2023-11-20 11:58:03,950 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.01 vs. limit=15.0 2023-11-20 11:58:08,029 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.97 vs. limit=22.5 2023-11-20 11:58:14,014 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 5250, loss[loss=0.1037, simple_loss=0.1353, pruned_loss=0.0294, audio_tagging_loss=0.006669, over 16298.00 frames. ], tot_loss[loss=0.07964, simple_loss=0.1006, pruned_loss=0.01945, audio_tagging_loss=0.009881, over 3046803.59 frames. ], batch size: 57, lr: 4.94e-03, grad_scale: 32.0 2023-11-20 11:58:30,933 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1077113.3333333333, ans=0.2 2023-11-20 11:59:04,567 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1077313.3333333333, ans=0.2 2023-11-20 11:59:06,188 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.70 vs. limit=15.0 2023-11-20 11:59:06,741 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 161600 2023-11-20 11:59:18,095 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 5300, loss[loss=0.06367, simple_loss=0.08041, pruned_loss=0.01452, audio_tagging_loss=0.008944, over 15524.00 frames. ], tot_loss[loss=0.07915, simple_loss=0.09988, pruned_loss=0.01929, audio_tagging_loss=0.009918, over 3039739.16 frames. ], batch size: 61, lr: 4.94e-03, grad_scale: 32.0 2023-11-20 11:59:29,274 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1077380.0, ans=0.0 2023-11-20 11:59:38,933 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.714e+01 8.511e+01 9.160e+01 9.855e+01 1.370e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-20 11:59:47,575 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1077513.3333333333, ans=0.125 2023-11-20 11:59:53,095 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1077513.3333333333, ans=0.125 2023-11-20 12:00:10,131 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1077646.6666666667, ans=0.125 2023-11-20 12:00:11,335 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 161650 2023-11-20 12:00:15,740 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1077646.6666666667, ans=0.125 2023-11-20 12:00:21,244 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1077646.6666666667, ans=10.0 2023-11-20 12:00:23,454 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 5350, loss[loss=0.08071, simple_loss=0.1032, pruned_loss=0.0172, audio_tagging_loss=0.0119, over 14810.00 frames. ], tot_loss[loss=0.07936, simple_loss=0.1004, pruned_loss=0.01928, audio_tagging_loss=0.00991, over 3040866.11 frames. ], batch size: 56, lr: 4.94e-03, grad_scale: 16.0 2023-11-20 12:00:27,818 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.55 vs. limit=22.5 2023-11-20 12:00:32,293 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1077713.3333333333, ans=0.125 2023-11-20 12:00:50,748 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1077846.6666666667, ans=0.0 2023-11-20 12:00:54,333 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1077846.6666666667, ans=0.125 2023-11-20 12:01:03,154 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1077913.3333333333, ans=0.2 2023-11-20 12:01:09,366 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=1077913.3333333333, ans=10.0 2023-11-20 12:01:15,653 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1077980.0, ans=0.125 2023-11-20 12:01:16,609 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 161700 2023-11-20 12:01:21,863 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1077980.0, ans=0.125 2023-11-20 12:01:28,045 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.11 vs. limit=12.0 2023-11-20 12:01:28,388 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 5400, loss[loss=0.07252, simple_loss=0.09202, pruned_loss=0.01695, audio_tagging_loss=0.009562, over 15134.00 frames. ], tot_loss[loss=0.07976, simple_loss=0.1012, pruned_loss=0.01933, audio_tagging_loss=0.00982, over 3040396.27 frames. ], batch size: 58, lr: 4.94e-03, grad_scale: 16.0 2023-11-20 12:01:44,534 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.49 vs. limit=22.5 2023-11-20 12:01:48,643 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.745e+01 8.017e+01 8.531e+01 9.206e+01 1.102e+02, threshold=1.706e+02, percent-clipped=0.0 2023-11-20 12:02:10,063 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1078246.6666666667, ans=0.035 2023-11-20 12:02:10,538 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.61 vs. limit=15.0 2023-11-20 12:02:13,365 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1078246.6666666667, ans=0.025 2023-11-20 12:02:18,202 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1078313.3333333333, ans=0.125 2023-11-20 12:02:18,322 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1078313.3333333333, ans=0.1 2023-11-20 12:02:20,743 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 161750 2023-11-20 12:02:22,806 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 12:02:32,318 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 5450, loss[loss=0.04437, simple_loss=0.0527, pruned_loss=0.009204, audio_tagging_loss=0.008816, over 16668.00 frames. ], tot_loss[loss=0.07962, simple_loss=0.1009, pruned_loss=0.01929, audio_tagging_loss=0.009884, over 3044322.51 frames. ], batch size: 66, lr: 4.94e-03, grad_scale: 16.0 2023-11-20 12:02:41,241 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1078380.0, ans=0.2 2023-11-20 12:02:58,038 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1078513.3333333333, ans=0.0 2023-11-20 12:03:02,309 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1078513.3333333333, ans=0.125 2023-11-20 12:03:25,025 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 161800 2023-11-20 12:03:29,144 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1078646.6666666667, ans=0.125 2023-11-20 12:03:29,516 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.53 vs. limit=6.0 2023-11-20 12:03:31,714 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1078646.6666666667, ans=0.125 2023-11-20 12:03:32,867 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1078646.6666666667, ans=0.1 2023-11-20 12:03:37,514 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 5500, loss[loss=0.07428, simple_loss=0.08814, pruned_loss=0.01653, audio_tagging_loss=0.01368, over 15115.00 frames. ], tot_loss[loss=0.07929, simple_loss=0.1002, pruned_loss=0.01923, audio_tagging_loss=0.009965, over 3046440.32 frames. ], batch size: 57, lr: 4.94e-03, grad_scale: 8.0 2023-11-20 12:03:39,175 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1078713.3333333333, ans=0.125 2023-11-20 12:03:44,800 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.92 vs. limit=5.0 2023-11-20 12:03:47,783 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1078713.3333333333, ans=0.0 2023-11-20 12:03:59,053 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.875e+01 8.226e+01 8.739e+01 9.564e+01 1.235e+02, threshold=1.748e+02, percent-clipped=0.0 2023-11-20 12:04:12,407 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff2.min_abs, batch_count=1078846.6666666667, ans=0.1 2023-11-20 12:04:30,681 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 161850 2023-11-20 12:04:38,304 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1078980.0, ans=0.95 2023-11-20 12:04:42,371 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 5550, loss[loss=0.1061, simple_loss=0.1345, pruned_loss=0.03049, audio_tagging_loss=0.008375, over 15018.00 frames. ], tot_loss[loss=0.0794, simple_loss=0.1, pruned_loss=0.01929, audio_tagging_loss=0.0101, over 3043758.48 frames. ], batch size: 55, lr: 4.94e-03, grad_scale: 8.0 2023-11-20 12:04:58,559 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.95 vs. limit=12.0 2023-11-20 12:05:01,924 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1079113.3333333333, ans=0.2 2023-11-20 12:05:04,246 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1079113.3333333333, ans=0.1 2023-11-20 12:05:04,350 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1079113.3333333333, ans=0.125 2023-11-20 12:05:08,191 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1079180.0, ans=0.125 2023-11-20 12:05:10,872 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.95 vs. limit=15.0 2023-11-20 12:05:35,084 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.51 vs. limit=22.5 2023-11-20 12:05:35,638 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 161900 2023-11-20 12:05:40,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1079313.3333333333, ans=0.2 2023-11-20 12:05:47,154 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 5600, loss[loss=0.0739, simple_loss=0.09211, pruned_loss=0.01621, audio_tagging_loss=0.01163, over 15000.00 frames. ], tot_loss[loss=0.07912, simple_loss=0.0996, pruned_loss=0.01911, audio_tagging_loss=0.0102, over 3045882.57 frames. ], batch size: 57, lr: 4.94e-03, grad_scale: 16.0 2023-11-20 12:05:49,948 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1079380.0, ans=0.0 2023-11-20 12:06:09,443 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.864e+01 8.052e+01 8.920e+01 9.702e+01 1.303e+02, threshold=1.784e+02, percent-clipped=0.0 2023-11-20 12:06:14,796 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1079513.3333333333, ans=0.0 2023-11-20 12:06:32,892 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1079580.0, ans=0.125 2023-11-20 12:06:35,279 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 12:06:40,257 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 161950 2023-11-20 12:06:43,219 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.03 vs. limit=15.0 2023-11-20 12:06:51,219 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 5650, loss[loss=0.0654, simple_loss=0.08632, pruned_loss=0.01214, audio_tagging_loss=0.01011, over 14945.00 frames. ], tot_loss[loss=0.0792, simple_loss=0.09986, pruned_loss=0.01912, audio_tagging_loss=0.01015, over 3048658.65 frames. ], batch size: 57, lr: 4.93e-03, grad_scale: 16.0 2023-11-20 12:06:56,933 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1079713.3333333333, ans=0.125 2023-11-20 12:07:27,296 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1079846.6666666667, ans=0.0 2023-11-20 12:07:38,513 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1079913.3333333333, ans=0.125 2023-11-20 12:07:45,038 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 162000 2023-11-20 12:07:52,323 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1079980.0, ans=0.0 2023-11-20 12:07:56,864 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 5700, loss[loss=0.0767, simple_loss=0.09045, pruned_loss=0.02169, audio_tagging_loss=0.009788, over 15456.00 frames. ], tot_loss[loss=0.07876, simple_loss=0.0992, pruned_loss=0.01897, audio_tagging_loss=0.01018, over 3045188.13 frames. ], batch size: 58, lr: 4.93e-03, grad_scale: 16.0 2023-11-20 12:08:02,066 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.80 vs. limit=8.0 2023-11-20 12:08:10,688 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1080113.3333333333, ans=0.125 2023-11-20 12:08:13,289 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1080113.3333333333, ans=0.04949747468305833 2023-11-20 12:08:15,624 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1080113.3333333333, ans=0.1 2023-11-20 12:08:18,953 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 5.645e+01 7.906e+01 8.669e+01 9.570e+01 1.200e+02, threshold=1.734e+02, percent-clipped=0.0 2023-11-20 12:08:25,531 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1080180.0, ans=0.125 2023-11-20 12:08:26,423 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1080180.0, ans=0.125 2023-11-20 12:08:28,965 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1080180.0, ans=0.125 2023-11-20 12:08:36,962 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1080246.6666666667, ans=0.125 2023-11-20 12:08:38,259 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1080246.6666666667, ans=0.125 2023-11-20 12:08:50,345 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 162050 2023-11-20 12:09:02,116 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 5750, loss[loss=0.08202, simple_loss=0.09712, pruned_loss=0.02141, audio_tagging_loss=0.01205, over 16266.00 frames. ], tot_loss[loss=0.07864, simple_loss=0.0994, pruned_loss=0.01894, audio_tagging_loss=0.009997, over 3048763.18 frames. ], batch size: 61, lr: 4.93e-03, grad_scale: 16.0 2023-11-20 12:09:04,877 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1080380.0, ans=0.125 2023-11-20 12:09:17,729 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.11 vs. limit=12.0 2023-11-20 12:09:23,321 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.98 vs. limit=22.5 2023-11-20 12:09:25,339 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1080446.6666666667, ans=0.125 2023-11-20 12:09:54,066 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1080646.6666666667, ans=0.0 2023-11-20 12:09:55,137 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 162100 2023-11-20 12:10:06,168 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 5800, loss[loss=0.06792, simple_loss=0.08412, pruned_loss=0.01137, audio_tagging_loss=0.01449, over 15194.00 frames. ], tot_loss[loss=0.07905, simple_loss=0.09962, pruned_loss=0.01929, audio_tagging_loss=0.009954, over 3045155.92 frames. ], batch size: 60, lr: 4.93e-03, grad_scale: 16.0 2023-11-20 12:10:06,423 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1080713.3333333333, ans=0.0 2023-11-20 12:10:13,170 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1080713.3333333333, ans=0.0 2023-11-20 12:10:28,580 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.36 vs. limit=15.0 2023-11-20 12:10:28,886 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.720e+01 8.138e+01 8.619e+01 9.335e+01 1.829e+02, threshold=1.724e+02, percent-clipped=1.0 2023-11-20 12:10:34,298 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1080846.6666666667, ans=0.125 2023-11-20 12:10:47,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1080913.3333333333, ans=0.125 2023-11-20 12:10:59,965 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 162150 2023-11-20 12:11:11,071 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 5850, loss[loss=0.07822, simple_loss=0.1019, pruned_loss=0.01736, audio_tagging_loss=0.009931, over 13959.00 frames. ], tot_loss[loss=0.07948, simple_loss=0.1002, pruned_loss=0.01947, audio_tagging_loss=0.009894, over 3041387.00 frames. ], batch size: 53, lr: 4.93e-03, grad_scale: 16.0 2023-11-20 12:11:31,849 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1081113.3333333333, ans=0.125 2023-11-20 12:12:04,363 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 162200 2023-11-20 12:12:08,042 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1081313.3333333333, ans=0.0 2023-11-20 12:12:16,984 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 5900, loss[loss=0.0714, simple_loss=0.07625, pruned_loss=0.01946, audio_tagging_loss=0.01381, over 14803.00 frames. ], tot_loss[loss=0.07961, simple_loss=0.1007, pruned_loss=0.01942, audio_tagging_loss=0.009834, over 3043044.28 frames. ], batch size: 57, lr: 4.93e-03, grad_scale: 16.0 2023-11-20 12:12:38,402 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.547e+01 7.908e+01 8.564e+01 9.521e+01 1.124e+02, threshold=1.713e+02, percent-clipped=0.0 2023-11-20 12:12:48,020 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.26 vs. limit=6.0 2023-11-20 12:13:06,043 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.38 vs. limit=15.0 2023-11-20 12:13:10,392 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 162250 2023-11-20 12:13:21,349 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 5950, loss[loss=0.07977, simple_loss=0.1073, pruned_loss=0.01731, audio_tagging_loss=0.008833, over 15238.00 frames. ], tot_loss[loss=0.07955, simple_loss=0.1006, pruned_loss=0.01939, audio_tagging_loss=0.009848, over 3041180.80 frames. ], batch size: 56, lr: 4.93e-03, grad_scale: 16.0 2023-11-20 12:13:21,721 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1081713.3333333333, ans=0.125 2023-11-20 12:13:57,774 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1081846.6666666667, ans=0.2 2023-11-20 12:13:57,777 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1081846.6666666667, ans=0.125 2023-11-20 12:14:14,701 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 162300 2023-11-20 12:14:22,968 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1081980.0, ans=0.125 2023-11-20 12:14:23,124 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1081980.0, ans=0.125 2023-11-20 12:14:26,417 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 6000, loss[loss=0.07249, simple_loss=0.0905, pruned_loss=0.0187, audio_tagging_loss=0.008537, over 15306.00 frames. ], tot_loss[loss=0.07971, simple_loss=0.1008, pruned_loss=0.01949, audio_tagging_loss=0.009817, over 3046056.58 frames. ], batch size: 59, lr: 4.93e-03, grad_scale: 32.0 2023-11-20 12:14:26,422 INFO [train_asr.py:1285] (0/4) Computing validation loss 2023-11-20 12:15:06,554 INFO [zipformer.py:1873] (0/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([1.9092, 3.1091, 3.5464, 2.8559, 3.8104, 3.7499, 3.3488, 3.1382], device='cuda:0') 2023-11-20 12:15:09,612 INFO [train_asr.py:1294] (0/4) Epoch 14, validation: loss=0.06225, simple_loss=0.05354, pruned_loss=0.005677, audio_tagging_loss=0.0298, over 4681554.00 frames. 2023-11-20 12:15:09,613 INFO [train_asr.py:1295] (0/4) Maximum memory allocated so far is 25925MB 2023-11-20 12:15:16,152 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1082046.6666666667, ans=0.125 2023-11-20 12:15:17,281 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1082046.6666666667, ans=0.0 2023-11-20 12:15:17,391 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1082046.6666666667, ans=0.125 2023-11-20 12:15:17,484 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1082046.6666666667, ans=0.2 2023-11-20 12:15:21,187 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1082113.3333333333, ans=0.1 2023-11-20 12:15:31,314 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.652e+01 8.215e+01 8.669e+01 9.712e+01 1.545e+02, threshold=1.734e+02, percent-clipped=0.0 2023-11-20 12:15:36,386 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1082180.0, ans=0.125 2023-11-20 12:15:36,422 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1082180.0, ans=0.125 2023-11-20 12:15:38,978 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1082180.0, ans=0.0 2023-11-20 12:15:44,500 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1082180.0, ans=0.0 2023-11-20 12:15:47,164 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1082246.6666666667, ans=0.0 2023-11-20 12:15:52,055 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1082246.6666666667, ans=0.1 2023-11-20 12:15:58,700 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 12:16:02,661 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 162350 2023-11-20 12:16:13,764 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 6050, loss[loss=0.08939, simple_loss=0.1248, pruned_loss=0.01943, audio_tagging_loss=0.007566, over 14617.00 frames. ], tot_loss[loss=0.0792, simple_loss=0.1002, pruned_loss=0.01936, audio_tagging_loss=0.009728, over 3046307.12 frames. ], batch size: 55, lr: 4.93e-03, grad_scale: 32.0 2023-11-20 12:16:27,624 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.73 vs. limit=22.5 2023-11-20 12:16:32,141 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.68 vs. limit=22.5 2023-11-20 12:16:36,544 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1082446.6666666667, ans=0.125 2023-11-20 12:16:43,568 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1082513.3333333333, ans=0.2 2023-11-20 12:16:51,408 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.96 vs. limit=15.0 2023-11-20 12:16:57,435 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1082580.0, ans=0.125 2023-11-20 12:17:07,545 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 162400 2023-11-20 12:17:16,724 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1082646.6666666667, ans=0.04949747468305833 2023-11-20 12:17:19,051 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 6100, loss[loss=0.07528, simple_loss=0.09555, pruned_loss=0.01849, audio_tagging_loss=0.009019, over 15951.00 frames. ], tot_loss[loss=0.07911, simple_loss=0.1002, pruned_loss=0.01918, audio_tagging_loss=0.009806, over 3049115.28 frames. ], batch size: 59, lr: 4.93e-03, grad_scale: 32.0 2023-11-20 12:17:24,975 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1082713.3333333333, ans=0.1 2023-11-20 12:17:26,262 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1082713.3333333333, ans=0.1 2023-11-20 12:17:39,683 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1082780.0, ans=0.125 2023-11-20 12:17:41,820 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.187e+01 8.098e+01 8.908e+01 9.804e+01 1.147e+02, threshold=1.782e+02, percent-clipped=0.0 2023-11-20 12:17:49,591 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1082846.6666666667, ans=0.1 2023-11-20 12:17:50,749 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1082846.6666666667, ans=0.0 2023-11-20 12:17:53,210 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1082846.6666666667, ans=0.125 2023-11-20 12:18:10,063 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1082980.0, ans=0.125 2023-11-20 12:18:11,154 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1082980.0, ans=0.0 2023-11-20 12:18:12,865 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 162450 2023-11-20 12:18:24,018 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 6150, loss[loss=0.06579, simple_loss=0.07746, pruned_loss=0.01498, audio_tagging_loss=0.01208, over 15828.00 frames. ], tot_loss[loss=0.07902, simple_loss=0.09987, pruned_loss=0.01921, audio_tagging_loss=0.009869, over 3049462.20 frames. ], batch size: 59, lr: 4.93e-03, grad_scale: 32.0 2023-11-20 12:18:27,366 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1083046.6666666667, ans=0.0 2023-11-20 12:18:34,166 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 12:19:02,390 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1083246.6666666667, ans=0.125 2023-11-20 12:19:17,357 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 162500 2023-11-20 12:19:29,091 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 6200, loss[loss=0.09272, simple_loss=0.1193, pruned_loss=0.02465, audio_tagging_loss=0.008433, over 15678.00 frames. ], tot_loss[loss=0.07869, simple_loss=0.0994, pruned_loss=0.01904, audio_tagging_loss=0.009953, over 3049989.94 frames. ], batch size: 57, lr: 4.93e-03, grad_scale: 32.0 2023-11-20 12:19:33,018 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1083380.0, ans=0.125 2023-11-20 12:19:42,458 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1083446.6666666667, ans=0.0 2023-11-20 12:19:48,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1083446.6666666667, ans=0.125 2023-11-20 12:19:51,147 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.044e+01 8.194e+01 8.727e+01 9.528e+01 1.309e+02, threshold=1.745e+02, percent-clipped=0.0 2023-11-20 12:19:55,407 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1083513.3333333333, ans=0.125 2023-11-20 12:20:02,733 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1083513.3333333333, ans=0.125 2023-11-20 12:20:21,998 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 162550 2023-11-20 12:20:26,388 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1083646.6666666667, ans=0.125 2023-11-20 12:20:33,738 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 6250, loss[loss=0.08034, simple_loss=0.1034, pruned_loss=0.01908, audio_tagging_loss=0.009549, over 15158.00 frames. ], tot_loss[loss=0.07906, simple_loss=0.09919, pruned_loss=0.01923, audio_tagging_loss=0.01023, over 3045237.76 frames. ], batch size: 58, lr: 4.93e-03, grad_scale: 32.0 2023-11-20 12:20:39,314 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.30 vs. limit=15.0 2023-11-20 12:21:04,273 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1083846.6666666667, ans=0.2 2023-11-20 12:21:07,764 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1083846.6666666667, ans=0.2 2023-11-20 12:21:08,415 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.48 vs. limit=15.0 2023-11-20 12:21:13,653 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.28 vs. limit=15.0 2023-11-20 12:21:26,689 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 162600 2023-11-20 12:21:30,336 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.09 vs. limit=15.0 2023-11-20 12:21:37,995 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 6300, loss[loss=0.07661, simple_loss=0.08683, pruned_loss=0.02199, audio_tagging_loss=0.0112, over 15090.00 frames. ], tot_loss[loss=0.07967, simple_loss=0.09974, pruned_loss=0.01956, audio_tagging_loss=0.01024, over 3045606.85 frames. ], batch size: 60, lr: 4.92e-03, grad_scale: 32.0 2023-11-20 12:21:45,109 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1084046.6666666667, ans=0.0 2023-11-20 12:21:54,295 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1084113.3333333333, ans=0.0 2023-11-20 12:22:00,274 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.702e+01 8.158e+01 9.011e+01 9.819e+01 1.577e+02, threshold=1.802e+02, percent-clipped=0.0 2023-11-20 12:22:03,822 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1084180.0, ans=0.125 2023-11-20 12:22:23,156 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 12:22:28,124 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1084246.6666666667, ans=0.1 2023-11-20 12:22:32,227 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 162650 2023-11-20 12:22:40,094 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.38 vs. limit=22.5 2023-11-20 12:22:43,822 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 6350, loss[loss=0.07588, simple_loss=0.09827, pruned_loss=0.01661, audio_tagging_loss=0.01013, over 14900.00 frames. ], tot_loss[loss=0.07986, simple_loss=0.09997, pruned_loss=0.01951, audio_tagging_loss=0.01036, over 3048955.62 frames. ], batch size: 56, lr: 4.92e-03, grad_scale: 32.0 2023-11-20 12:22:56,284 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.87 vs. limit=15.0 2023-11-20 12:23:18,288 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1084513.3333333333, ans=0.125 2023-11-20 12:23:18,351 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1084513.3333333333, ans=0.1 2023-11-20 12:23:35,472 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1084646.6666666667, ans=0.07 2023-11-20 12:23:36,505 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 162700 2023-11-20 12:23:41,074 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1084646.6666666667, ans=0.125 2023-11-20 12:23:48,118 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 6400, loss[loss=0.06396, simple_loss=0.073, pruned_loss=0.0157, audio_tagging_loss=0.01176, over 15760.00 frames. ], tot_loss[loss=0.07998, simple_loss=0.1003, pruned_loss=0.01948, audio_tagging_loss=0.01036, over 3048587.78 frames. ], batch size: 60, lr: 4.92e-03, grad_scale: 32.0 2023-11-20 12:23:51,361 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.13 vs. limit=15.0 2023-11-20 12:24:02,700 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1084780.0, ans=0.0 2023-11-20 12:24:08,903 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1084780.0, ans=0.1 2023-11-20 12:24:10,366 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.846e+01 8.159e+01 8.686e+01 9.396e+01 1.221e+02, threshold=1.737e+02, percent-clipped=0.0 2023-11-20 12:24:15,917 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.18 vs. limit=15.0 2023-11-20 12:24:21,840 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1084846.6666666667, ans=0.125 2023-11-20 12:24:40,960 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 162750 2023-11-20 12:24:46,796 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1084980.0, ans=0.125 2023-11-20 12:24:50,518 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1084980.0, ans=0.125 2023-11-20 12:24:52,696 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 6450, loss[loss=0.09086, simple_loss=0.1227, pruned_loss=0.02117, audio_tagging_loss=0.008324, over 14749.00 frames. ], tot_loss[loss=0.07938, simple_loss=0.09923, pruned_loss=0.01932, audio_tagging_loss=0.01045, over 3045936.01 frames. ], batch size: 56, lr: 4.92e-03, grad_scale: 32.0 2023-11-20 12:25:22,116 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1085180.0, ans=0.2 2023-11-20 12:25:23,260 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1085180.0, ans=0.125 2023-11-20 12:25:27,423 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.23 vs. limit=15.0 2023-11-20 12:25:29,561 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1085180.0, ans=0.125 2023-11-20 12:25:45,801 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 162800 2023-11-20 12:25:57,214 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 6500, loss[loss=0.09199, simple_loss=0.1124, pruned_loss=0.02337, audio_tagging_loss=0.0124, over 16212.00 frames. ], tot_loss[loss=0.07957, simple_loss=0.09966, pruned_loss=0.01935, audio_tagging_loss=0.01039, over 3049027.37 frames. ], batch size: 58, lr: 4.92e-03, grad_scale: 32.0 2023-11-20 12:26:18,883 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.899e+01 8.195e+01 8.750e+01 9.288e+01 1.237e+02, threshold=1.750e+02, percent-clipped=0.0 2023-11-20 12:26:33,362 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1085513.3333333333, ans=0.2 2023-11-20 12:26:49,742 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 162850 2023-11-20 12:27:01,696 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 6550, loss[loss=0.06707, simple_loss=0.08257, pruned_loss=0.01451, audio_tagging_loss=0.01128, over 15012.00 frames. ], tot_loss[loss=0.07941, simple_loss=0.09972, pruned_loss=0.01936, audio_tagging_loss=0.01019, over 3041618.34 frames. ], batch size: 56, lr: 4.92e-03, grad_scale: 32.0 2023-11-20 12:27:23,793 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1085780.0, ans=0.0 2023-11-20 12:27:30,617 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1085846.6666666667, ans=0.1 2023-11-20 12:27:32,974 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1085846.6666666667, ans=0.0 2023-11-20 12:27:54,783 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 162900 2023-11-20 12:28:06,259 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 6600, loss[loss=0.05786, simple_loss=0.06431, pruned_loss=0.01268, audio_tagging_loss=0.01303, over 14373.00 frames. ], tot_loss[loss=0.07929, simple_loss=0.09967, pruned_loss=0.01936, audio_tagging_loss=0.01009, over 3040763.29 frames. ], batch size: 55, lr: 4.92e-03, grad_scale: 32.0 2023-11-20 12:28:24,233 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1086113.3333333333, ans=0.07 2023-11-20 12:28:27,639 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.865e+01 8.102e+01 8.649e+01 9.372e+01 1.515e+02, threshold=1.730e+02, percent-clipped=0.0 2023-11-20 12:28:30,951 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.12 vs. limit=15.0 2023-11-20 12:28:59,043 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 162950 2023-11-20 12:29:10,581 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 6650, loss[loss=0.09653, simple_loss=0.1289, pruned_loss=0.02483, audio_tagging_loss=0.007247, over 15127.00 frames. ], tot_loss[loss=0.07962, simple_loss=0.1006, pruned_loss=0.01939, audio_tagging_loss=0.00994, over 3050366.87 frames. ], batch size: 55, lr: 4.92e-03, grad_scale: 32.0 2023-11-20 12:29:12,835 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.44 vs. limit=22.5 2023-11-20 12:29:31,918 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1086446.6666666667, ans=0.125 2023-11-20 12:29:41,709 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.27 vs. limit=15.0 2023-11-20 12:29:45,057 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1086513.3333333333, ans=0.0 2023-11-20 12:29:57,823 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1086580.0, ans=0.0 2023-11-20 12:30:03,810 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 163000 2023-11-20 12:30:10,876 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.89 vs. limit=6.0 2023-11-20 12:30:15,838 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 6700, loss[loss=0.07043, simple_loss=0.08753, pruned_loss=0.01831, audio_tagging_loss=0.008356, over 14767.00 frames. ], tot_loss[loss=0.08031, simple_loss=0.1017, pruned_loss=0.01962, audio_tagging_loss=0.009859, over 3055196.34 frames. ], batch size: 57, lr: 4.92e-03, grad_scale: 32.0 2023-11-20 12:30:18,544 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1086713.3333333333, ans=0.1 2023-11-20 12:30:19,592 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1086713.3333333333, ans=0.1 2023-11-20 12:30:33,223 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1086780.0, ans=0.1 2023-11-20 12:30:35,472 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1086780.0, ans=0.125 2023-11-20 12:30:37,675 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.914e+01 8.088e+01 8.794e+01 9.537e+01 1.389e+02, threshold=1.759e+02, percent-clipped=0.0 2023-11-20 12:30:52,093 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 12:31:07,431 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1086980.0, ans=0.5 2023-11-20 12:31:08,499 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 163050 2023-11-20 12:31:20,556 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 6750, loss[loss=0.06426, simple_loss=0.07771, pruned_loss=0.0136, audio_tagging_loss=0.0118, over 14862.00 frames. ], tot_loss[loss=0.0799, simple_loss=0.1012, pruned_loss=0.01947, audio_tagging_loss=0.009825, over 3052585.00 frames. ], batch size: 57, lr: 4.92e-03, grad_scale: 32.0 2023-11-20 12:31:24,450 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1087046.6666666667, ans=0.2 2023-11-20 12:31:24,589 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1087046.6666666667, ans=0.125 2023-11-20 12:31:34,850 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1087113.3333333333, ans=0.125 2023-11-20 12:32:02,323 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1087246.6666666667, ans=0.0 2023-11-20 12:32:13,376 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 163100 2023-11-20 12:32:16,068 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1087313.3333333333, ans=0.2 2023-11-20 12:32:16,103 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1087313.3333333333, ans=0.125 2023-11-20 12:32:24,822 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 6800, loss[loss=0.07038, simple_loss=0.08844, pruned_loss=0.01304, audio_tagging_loss=0.01312, over 16446.00 frames. ], tot_loss[loss=0.0794, simple_loss=0.1004, pruned_loss=0.01945, audio_tagging_loss=0.00977, over 3046654.66 frames. ], batch size: 61, lr: 4.92e-03, grad_scale: 32.0 2023-11-20 12:32:38,681 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.84 vs. limit=15.0 2023-11-20 12:32:45,879 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.532e+01 7.924e+01 8.642e+01 9.401e+01 1.270e+02, threshold=1.728e+02, percent-clipped=0.0 2023-11-20 12:33:07,134 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1087580.0, ans=0.125 2023-11-20 12:33:17,157 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 163150 2023-11-20 12:33:19,157 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.08 vs. limit=6.0 2023-11-20 12:33:28,590 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 6850, loss[loss=0.06682, simple_loss=0.08249, pruned_loss=0.01661, audio_tagging_loss=0.008964, over 14298.00 frames. ], tot_loss[loss=0.07965, simple_loss=0.1009, pruned_loss=0.01945, audio_tagging_loss=0.009765, over 3044430.77 frames. ], batch size: 54, lr: 4.92e-03, grad_scale: 32.0 2023-11-20 12:33:49,109 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1087780.0, ans=0.1 2023-11-20 12:34:06,085 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=1087913.3333333333, ans=22.5 2023-11-20 12:34:17,919 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1087913.3333333333, ans=0.0 2023-11-20 12:34:21,369 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 163200 2023-11-20 12:34:24,274 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1087980.0, ans=0.1 2023-11-20 12:34:32,707 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 6900, loss[loss=0.08259, simple_loss=0.1006, pruned_loss=0.02214, audio_tagging_loss=0.01013, over 14678.00 frames. ], tot_loss[loss=0.07913, simple_loss=0.1003, pruned_loss=0.01919, audio_tagging_loss=0.009787, over 3042581.67 frames. ], batch size: 57, lr: 4.92e-03, grad_scale: 32.0 2023-11-20 12:34:40,755 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.12 vs. limit=15.0 2023-11-20 12:34:55,165 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.594e+01 8.124e+01 8.683e+01 9.436e+01 1.192e+02, threshold=1.737e+02, percent-clipped=0.0 2023-11-20 12:34:56,792 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1088113.3333333333, ans=0.125 2023-11-20 12:34:59,103 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1088180.0, ans=0.0 2023-11-20 12:35:13,303 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1088246.6666666667, ans=0.125 2023-11-20 12:35:17,335 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.44 vs. limit=12.0 2023-11-20 12:35:20,645 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1088246.6666666667, ans=0.1 2023-11-20 12:35:20,755 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1088246.6666666667, ans=0.125 2023-11-20 12:35:20,814 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1088246.6666666667, ans=0.0 2023-11-20 12:35:24,857 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 12:35:26,767 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 163250 2023-11-20 12:35:38,404 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 6950, loss[loss=0.09013, simple_loss=0.1139, pruned_loss=0.02312, audio_tagging_loss=0.01006, over 15609.00 frames. ], tot_loss[loss=0.08006, simple_loss=0.1018, pruned_loss=0.01946, audio_tagging_loss=0.00971, over 3048686.02 frames. ], batch size: 57, lr: 4.92e-03, grad_scale: 32.0 2023-11-20 12:35:39,104 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.55 vs. limit=15.0 2023-11-20 12:35:50,757 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1088446.6666666667, ans=0.125 2023-11-20 12:36:03,979 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1088513.3333333333, ans=0.2 2023-11-20 12:36:17,889 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.64 vs. limit=15.0 2023-11-20 12:36:29,435 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 12:36:31,634 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 163300 2023-11-20 12:36:42,516 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 7000, loss[loss=0.05888, simple_loss=0.07129, pruned_loss=0.01242, audio_tagging_loss=0.01082, over 15069.00 frames. ], tot_loss[loss=0.07976, simple_loss=0.1014, pruned_loss=0.0193, audio_tagging_loss=0.00977, over 3053008.81 frames. ], batch size: 56, lr: 4.91e-03, grad_scale: 32.0 2023-11-20 12:36:50,787 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1088713.3333333333, ans=0.125 2023-11-20 12:37:04,379 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.412e+01 8.025e+01 8.662e+01 9.457e+01 1.125e+02, threshold=1.732e+02, percent-clipped=0.0 2023-11-20 12:37:12,619 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1088846.6666666667, ans=0.0 2023-11-20 12:37:13,889 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1088846.6666666667, ans=0.0 2023-11-20 12:37:33,988 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1088980.0, ans=0.125 2023-11-20 12:37:35,893 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 163350 2023-11-20 12:37:46,701 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 7050, loss[loss=0.08318, simple_loss=0.1027, pruned_loss=0.02108, audio_tagging_loss=0.01072, over 15430.00 frames. ], tot_loss[loss=0.07918, simple_loss=0.1002, pruned_loss=0.01925, audio_tagging_loss=0.009847, over 3044338.78 frames. ], batch size: 57, lr: 4.91e-03, grad_scale: 32.0 2023-11-20 12:37:51,597 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.62 vs. limit=12.0 2023-11-20 12:37:56,510 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.16 vs. limit=10.0 2023-11-20 12:38:06,167 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.02 vs. limit=15.0 2023-11-20 12:38:13,071 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1089180.0, ans=0.125 2023-11-20 12:38:15,664 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1089180.0, ans=0.05 2023-11-20 12:38:26,035 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1089246.6666666667, ans=10.0 2023-11-20 12:38:33,210 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1089246.6666666667, ans=0.0 2023-11-20 12:38:37,051 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1089313.3333333333, ans=0.125 2023-11-20 12:38:39,748 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 163400 2023-11-20 12:38:40,349 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.22 vs. limit=15.0 2023-11-20 12:38:41,607 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1089313.3333333333, ans=0.125 2023-11-20 12:38:47,955 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1089313.3333333333, ans=0.125 2023-11-20 12:38:51,934 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 7100, loss[loss=0.07594, simple_loss=0.1, pruned_loss=0.01777, audio_tagging_loss=0.008169, over 16030.00 frames. ], tot_loss[loss=0.07935, simple_loss=0.1004, pruned_loss=0.01926, audio_tagging_loss=0.009898, over 3044562.71 frames. ], batch size: 59, lr: 4.91e-03, grad_scale: 32.0 2023-11-20 12:38:55,304 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1089380.0, ans=0.2 2023-11-20 12:39:03,981 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1089446.6666666667, ans=0.125 2023-11-20 12:39:08,836 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 12:39:14,020 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.595e+01 8.043e+01 8.663e+01 9.375e+01 1.240e+02, threshold=1.733e+02, percent-clipped=0.0 2023-11-20 12:39:19,173 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1089513.3333333333, ans=0.0 2023-11-20 12:39:35,768 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1089580.0, ans=0.2 2023-11-20 12:39:37,224 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.26 vs. limit=12.0 2023-11-20 12:39:45,220 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 163450 2023-11-20 12:39:56,025 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 7150, loss[loss=0.08091, simple_loss=0.1051, pruned_loss=0.01642, audio_tagging_loss=0.01194, over 16356.00 frames. ], tot_loss[loss=0.07897, simple_loss=0.09954, pruned_loss=0.01911, audio_tagging_loss=0.0101, over 3050584.72 frames. ], batch size: 60, lr: 4.91e-03, grad_scale: 32.0 2023-11-20 12:40:09,254 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1089780.0, ans=0.0 2023-11-20 12:40:16,796 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1089780.0, ans=0.0 2023-11-20 12:40:20,963 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1089846.6666666667, ans=0.1 2023-11-20 12:40:23,922 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.36 vs. limit=15.0 2023-11-20 12:40:44,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1089913.3333333333, ans=0.1 2023-11-20 12:40:48,635 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 163500 2023-11-20 12:40:50,990 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.26 vs. limit=22.5 2023-11-20 12:41:00,083 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 7200, loss[loss=0.06421, simple_loss=0.07192, pruned_loss=0.01743, audio_tagging_loss=0.01082, over 15917.00 frames. ], tot_loss[loss=0.07894, simple_loss=0.09958, pruned_loss=0.01906, audio_tagging_loss=0.0101, over 3043387.33 frames. ], batch size: 62, lr: 4.91e-03, grad_scale: 32.0 2023-11-20 12:41:19,753 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1090113.3333333333, ans=0.125 2023-11-20 12:41:22,968 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1090113.3333333333, ans=0.05 2023-11-20 12:41:23,961 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.741e+01 8.075e+01 8.632e+01 9.241e+01 3.399e+02, threshold=1.726e+02, percent-clipped=1.0 2023-11-20 12:41:28,199 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 12:41:34,514 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1090180.0, ans=0.125 2023-11-20 12:41:53,262 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 163550 2023-11-20 12:41:54,658 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1090313.3333333333, ans=0.125 2023-11-20 12:42:00,321 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1090313.3333333333, ans=0.025 2023-11-20 12:42:04,799 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 7250, loss[loss=0.07172, simple_loss=0.09871, pruned_loss=0.01309, audio_tagging_loss=0.009274, over 14670.00 frames. ], tot_loss[loss=0.07951, simple_loss=0.1002, pruned_loss=0.01922, audio_tagging_loss=0.01018, over 3043476.09 frames. ], batch size: 56, lr: 4.91e-03, grad_scale: 32.0 2023-11-20 12:42:10,611 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1090380.0, ans=0.125 2023-11-20 12:42:22,102 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1090446.6666666667, ans=0.1 2023-11-20 12:42:54,915 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1090646.6666666667, ans=0.125 2023-11-20 12:42:56,278 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1090646.6666666667, ans=0.1 2023-11-20 12:42:57,792 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 163600 2023-11-20 12:43:07,492 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1090646.6666666667, ans=0.125 2023-11-20 12:43:08,621 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1090713.3333333333, ans=0.125 2023-11-20 12:43:09,580 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 7300, loss[loss=0.06881, simple_loss=0.07877, pruned_loss=0.01365, audio_tagging_loss=0.01578, over 14652.00 frames. ], tot_loss[loss=0.08006, simple_loss=0.1008, pruned_loss=0.01948, audio_tagging_loss=0.01016, over 3043661.92 frames. ], batch size: 59, lr: 4.91e-03, grad_scale: 32.0 2023-11-20 12:43:23,767 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1090780.0, ans=0.0 2023-11-20 12:43:31,927 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.842e+01 8.253e+01 8.937e+01 9.591e+01 1.343e+02, threshold=1.787e+02, percent-clipped=0.0 2023-11-20 12:43:35,438 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1090846.6666666667, ans=0.0 2023-11-20 12:43:43,769 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1090846.6666666667, ans=0.05 2023-11-20 12:44:01,925 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 163650 2023-11-20 12:44:09,991 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1090980.0, ans=0.0 2023-11-20 12:44:12,908 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.98 vs. limit=15.0 2023-11-20 12:44:13,423 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 7350, loss[loss=0.06723, simple_loss=0.08221, pruned_loss=0.01436, audio_tagging_loss=0.01176, over 16451.00 frames. ], tot_loss[loss=0.08015, simple_loss=0.1012, pruned_loss=0.01948, audio_tagging_loss=0.01006, over 3047812.09 frames. ], batch size: 64, lr: 4.91e-03, grad_scale: 32.0 2023-11-20 12:44:47,026 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1091180.0, ans=0.125 2023-11-20 12:45:05,637 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 163700 2023-11-20 12:45:17,050 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 7400, loss[loss=0.07918, simple_loss=0.09856, pruned_loss=0.02259, audio_tagging_loss=0.007312, over 14518.00 frames. ], tot_loss[loss=0.07981, simple_loss=0.1011, pruned_loss=0.01936, audio_tagging_loss=0.009886, over 3038024.63 frames. ], batch size: 57, lr: 4.91e-03, grad_scale: 16.0 2023-11-20 12:45:33,672 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1091446.6666666667, ans=0.125 2023-11-20 12:45:34,638 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1091446.6666666667, ans=0.125 2023-11-20 12:45:41,933 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.771e+01 8.379e+01 9.065e+01 9.616e+01 1.278e+02, threshold=1.813e+02, percent-clipped=0.0 2023-11-20 12:45:49,505 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.77 vs. limit=15.0 2023-11-20 12:45:55,239 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1091580.0, ans=0.125 2023-11-20 12:46:03,741 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.25 vs. limit=15.0 2023-11-20 12:46:09,697 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.21 vs. limit=15.0 2023-11-20 12:46:10,331 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 163750 2023-11-20 12:46:21,950 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 7450, loss[loss=0.07371, simple_loss=0.09956, pruned_loss=0.01644, audio_tagging_loss=0.007483, over 15188.00 frames. ], tot_loss[loss=0.07941, simple_loss=0.1009, pruned_loss=0.01918, audio_tagging_loss=0.009788, over 3041475.94 frames. ], batch size: 59, lr: 4.91e-03, grad_scale: 16.0 2023-11-20 12:46:55,060 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1091846.6666666667, ans=0.07 2023-11-20 12:46:55,148 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1091846.6666666667, ans=0.09899494936611666 2023-11-20 12:47:15,632 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 163800 2023-11-20 12:47:17,603 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=1091980.0, ans=15.0 2023-11-20 12:47:27,457 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 7500, loss[loss=0.09126, simple_loss=0.1218, pruned_loss=0.02015, audio_tagging_loss=0.0102, over 16459.00 frames. ], tot_loss[loss=0.0792, simple_loss=0.1009, pruned_loss=0.01908, audio_tagging_loss=0.009669, over 3047820.42 frames. ], batch size: 60, lr: 4.91e-03, grad_scale: 16.0 2023-11-20 12:47:36,424 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.73 vs. limit=22.5 2023-11-20 12:47:47,509 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1092113.3333333333, ans=0.0 2023-11-20 12:47:51,399 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.921e+01 8.205e+01 8.885e+01 9.811e+01 1.439e+02, threshold=1.777e+02, percent-clipped=0.0 2023-11-20 12:47:51,683 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1092180.0, ans=0.0 2023-11-20 12:48:15,767 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1092246.6666666667, ans=0.125 2023-11-20 12:48:19,231 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 163850 2023-11-20 12:48:24,212 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 12:48:30,689 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 7550, loss[loss=0.09273, simple_loss=0.1292, pruned_loss=0.02262, audio_tagging_loss=0.005525, over 13901.00 frames. ], tot_loss[loss=0.07912, simple_loss=0.1007, pruned_loss=0.01913, audio_tagging_loss=0.009634, over 3040429.27 frames. ], batch size: 53, lr: 4.91e-03, grad_scale: 16.0 2023-11-20 12:48:56,307 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1092513.3333333333, ans=0.125 2023-11-20 12:49:00,774 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1092513.3333333333, ans=0.0 2023-11-20 12:49:09,137 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1092580.0, ans=0.035 2023-11-20 12:49:23,461 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 163900 2023-11-20 12:49:26,062 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 12:49:27,317 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1092646.6666666667, ans=0.5 2023-11-20 12:49:34,875 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 7600, loss[loss=0.0772, simple_loss=0.1027, pruned_loss=0.01737, audio_tagging_loss=0.008494, over 15960.00 frames. ], tot_loss[loss=0.07964, simple_loss=0.1013, pruned_loss=0.01932, audio_tagging_loss=0.009684, over 3048558.66 frames. ], batch size: 62, lr: 4.91e-03, grad_scale: 32.0 2023-11-20 12:49:35,375 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1092713.3333333333, ans=0.04949747468305833 2023-11-20 12:49:55,828 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1092780.0, ans=0.125 2023-11-20 12:49:59,231 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.781e+01 8.009e+01 8.543e+01 9.202e+01 1.104e+02, threshold=1.709e+02, percent-clipped=0.0 2023-11-20 12:49:59,588 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1092846.6666666667, ans=0.0 2023-11-20 12:49:59,957 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.57 vs. limit=12.0 2023-11-20 12:50:11,169 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1092846.6666666667, ans=0.125 2023-11-20 12:50:27,545 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 163950 2023-11-20 12:50:39,682 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 7650, loss[loss=0.08723, simple_loss=0.1098, pruned_loss=0.02358, audio_tagging_loss=0.008779, over 14654.00 frames. ], tot_loss[loss=0.07899, simple_loss=0.1004, pruned_loss=0.01908, audio_tagging_loss=0.009735, over 3038780.85 frames. ], batch size: 59, lr: 4.90e-03, grad_scale: 32.0 2023-11-20 12:50:40,029 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1093046.6666666667, ans=0.125 2023-11-20 12:50:48,391 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1093046.6666666667, ans=0.125 2023-11-20 12:51:06,591 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1093180.0, ans=0.0 2023-11-20 12:51:10,294 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1093180.0, ans=0.125 2023-11-20 12:51:20,096 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1093246.6666666667, ans=0.2 2023-11-20 12:51:23,418 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1093246.6666666667, ans=0.125 2023-11-20 12:51:31,749 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 164000 2023-11-20 12:51:33,257 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-164000.pt 2023-11-20 12:51:47,518 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 7700, loss[loss=0.05877, simple_loss=0.07834, pruned_loss=0.01105, audio_tagging_loss=0.008548, over 13859.00 frames. ], tot_loss[loss=0.0791, simple_loss=0.1007, pruned_loss=0.01902, audio_tagging_loss=0.009743, over 3048632.26 frames. ], batch size: 54, lr: 4.90e-03, grad_scale: 32.0 2023-11-20 12:51:49,109 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1093380.0, ans=0.1 2023-11-20 12:52:00,477 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1093446.6666666667, ans=0.2 2023-11-20 12:52:06,821 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1093446.6666666667, ans=0.0 2023-11-20 12:52:11,271 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.534e+01 7.870e+01 8.763e+01 9.438e+01 1.322e+02, threshold=1.753e+02, percent-clipped=0.0 2023-11-20 12:52:21,428 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1093513.3333333333, ans=0.125 2023-11-20 12:52:24,796 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1093580.0, ans=0.04949747468305833 2023-11-20 12:52:31,382 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.69 vs. limit=12.0 2023-11-20 12:52:39,894 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 164050 2023-11-20 12:52:42,574 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1093646.6666666667, ans=0.1 2023-11-20 12:52:44,996 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1093646.6666666667, ans=0.2 2023-11-20 12:52:49,304 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1093646.6666666667, ans=0.2 2023-11-20 12:52:51,394 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 7750, loss[loss=0.08591, simple_loss=0.1061, pruned_loss=0.02505, audio_tagging_loss=0.007792, over 15433.00 frames. ], tot_loss[loss=0.07894, simple_loss=0.1001, pruned_loss=0.01903, audio_tagging_loss=0.009837, over 3050044.90 frames. ], batch size: 55, lr: 4.90e-03, grad_scale: 32.0 2023-11-20 12:53:12,501 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1093780.0, ans=0.125 2023-11-20 12:53:21,141 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1093846.6666666667, ans=0.125 2023-11-20 12:53:44,056 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 164100 2023-11-20 12:53:55,669 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 7800, loss[loss=0.08456, simple_loss=0.09767, pruned_loss=0.02386, audio_tagging_loss=0.01187, over 14682.00 frames. ], tot_loss[loss=0.0791, simple_loss=0.1002, pruned_loss=0.01913, audio_tagging_loss=0.009855, over 3044728.82 frames. ], batch size: 55, lr: 4.90e-03, grad_scale: 32.0 2023-11-20 12:54:11,397 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1094113.3333333333, ans=0.125 2023-11-20 12:54:20,241 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.543e+01 8.289e+01 9.083e+01 1.010e+02 1.614e+02, threshold=1.817e+02, percent-clipped=0.0 2023-11-20 12:54:48,375 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 164150 2023-11-20 12:54:59,841 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 7850, loss[loss=0.0655, simple_loss=0.08077, pruned_loss=0.01564, audio_tagging_loss=0.009473, over 15019.00 frames. ], tot_loss[loss=0.07922, simple_loss=0.1005, pruned_loss=0.01915, audio_tagging_loss=0.009834, over 3048684.32 frames. ], batch size: 57, lr: 4.90e-03, grad_scale: 32.0 2023-11-20 12:55:00,069 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1094380.0, ans=0.5 2023-11-20 12:55:48,729 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=1094580.0, ans=10.0 2023-11-20 12:55:53,656 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 164200 2023-11-20 12:56:05,324 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 7900, loss[loss=0.05406, simple_loss=0.06443, pruned_loss=0.009607, audio_tagging_loss=0.01224, over 14677.00 frames. ], tot_loss[loss=0.07983, simple_loss=0.1007, pruned_loss=0.01946, audio_tagging_loss=0.01, over 3049725.83 frames. ], batch size: 57, lr: 4.90e-03, grad_scale: 32.0 2023-11-20 12:56:10,882 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.79 vs. limit=15.0 2023-11-20 12:56:25,635 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1094780.0, ans=0.0 2023-11-20 12:56:28,981 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.813e+01 8.207e+01 9.087e+01 9.691e+01 1.187e+02, threshold=1.817e+02, percent-clipped=0.0 2023-11-20 12:56:58,017 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 164250 2023-11-20 12:57:09,132 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 7950, loss[loss=0.06228, simple_loss=0.08578, pruned_loss=0.009447, audio_tagging_loss=0.009947, over 14421.00 frames. ], tot_loss[loss=0.07991, simple_loss=0.1007, pruned_loss=0.01936, audio_tagging_loss=0.01021, over 3046512.81 frames. ], batch size: 53, lr: 4.90e-03, grad_scale: 32.0 2023-11-20 12:57:09,298 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1095046.6666666667, ans=0.0 2023-11-20 12:57:14,226 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.81 vs. limit=22.5 2023-11-20 12:57:16,254 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1095046.6666666667, ans=0.2 2023-11-20 12:57:26,271 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 12:57:34,616 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1095180.0, ans=0.125 2023-11-20 12:58:02,392 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 164300 2023-11-20 12:58:09,855 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1095313.3333333333, ans=0.1 2023-11-20 12:58:13,254 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 8000, loss[loss=0.0917, simple_loss=0.1297, pruned_loss=0.02094, audio_tagging_loss=0.005909, over 16682.00 frames. ], tot_loss[loss=0.07985, simple_loss=0.1007, pruned_loss=0.01928, audio_tagging_loss=0.01022, over 3046200.99 frames. ], batch size: 62, lr: 4.90e-03, grad_scale: 32.0 2023-11-20 12:58:37,627 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.04 vs. limit=22.5 2023-11-20 12:58:38,836 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.70 vs. limit=6.0 2023-11-20 12:58:39,386 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.429e+01 8.050e+01 8.646e+01 9.454e+01 1.422e+02, threshold=1.729e+02, percent-clipped=0.0 2023-11-20 12:59:06,801 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 164350 2023-11-20 12:59:11,912 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1095646.6666666667, ans=0.125 2023-11-20 12:59:13,131 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1095646.6666666667, ans=0.125 2023-11-20 12:59:17,714 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 8050, loss[loss=0.08711, simple_loss=0.1084, pruned_loss=0.02501, audio_tagging_loss=0.007902, over 15848.00 frames. ], tot_loss[loss=0.08075, simple_loss=0.1021, pruned_loss=0.01949, audio_tagging_loss=0.01022, over 3044003.21 frames. ], batch size: 58, lr: 4.90e-03, grad_scale: 16.0 2023-11-20 13:00:10,889 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 164400 2023-11-20 13:00:21,710 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.48 vs. limit=15.0 2023-11-20 13:00:22,245 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 8100, loss[loss=0.05888, simple_loss=0.0801, pruned_loss=0.01057, audio_tagging_loss=0.008263, over 16686.00 frames. ], tot_loss[loss=0.08061, simple_loss=0.1021, pruned_loss=0.01949, audio_tagging_loss=0.01008, over 3052432.43 frames. ], batch size: 65, lr: 4.90e-03, grad_scale: 8.0 2023-11-20 13:00:47,761 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1096180.0, ans=0.2 2023-11-20 13:00:50,966 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.227e+01 8.681e+01 9.657e+01 1.040e+02 1.994e+02, threshold=1.931e+02, percent-clipped=2.0 2023-11-20 13:01:07,249 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.03 vs. limit=8.0 2023-11-20 13:01:07,786 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1096246.6666666667, ans=0.125 2023-11-20 13:01:11,603 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1096246.6666666667, ans=0.04949747468305833 2023-11-20 13:01:11,877 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.37 vs. limit=6.0 2023-11-20 13:01:15,045 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 164450 2023-11-20 13:01:26,638 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 8150, loss[loss=0.09125, simple_loss=0.1168, pruned_loss=0.02554, audio_tagging_loss=0.007297, over 16075.00 frames. ], tot_loss[loss=0.08029, simple_loss=0.1016, pruned_loss=0.01952, audio_tagging_loss=0.009981, over 3043274.00 frames. ], batch size: 58, lr: 4.90e-03, grad_scale: 8.0 2023-11-20 13:01:52,420 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1096513.3333333333, ans=0.125 2023-11-20 13:01:59,725 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1096513.3333333333, ans=0.125 2023-11-20 13:02:02,178 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1096513.3333333333, ans=0.0 2023-11-20 13:02:03,860 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=1096513.3333333333, ans=15.0 2023-11-20 13:02:05,688 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.13 vs. limit=12.0 2023-11-20 13:02:17,424 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1096646.6666666667, ans=0.125 2023-11-20 13:02:19,693 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 164500 2023-11-20 13:02:22,848 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1096646.6666666667, ans=0.125 2023-11-20 13:02:24,128 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1096646.6666666667, ans=0.09899494936611666 2023-11-20 13:02:31,747 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 8200, loss[loss=0.09306, simple_loss=0.114, pruned_loss=0.02526, audio_tagging_loss=0.01081, over 15113.00 frames. ], tot_loss[loss=0.0807, simple_loss=0.1026, pruned_loss=0.01953, audio_tagging_loss=0.00988, over 3047036.81 frames. ], batch size: 57, lr: 4.90e-03, grad_scale: 8.0 2023-11-20 13:02:33,025 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 13:02:37,679 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.88 vs. limit=5.0 2023-11-20 13:02:38,808 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 13:02:44,063 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.17 vs. limit=22.5 2023-11-20 13:02:49,265 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.00 vs. limit=15.0 2023-11-20 13:03:00,016 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.486e+01 8.378e+01 8.960e+01 9.917e+01 5.109e+02, threshold=1.792e+02, percent-clipped=1.0 2023-11-20 13:03:05,307 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1096846.6666666667, ans=0.0 2023-11-20 13:03:24,746 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 164550 2023-11-20 13:03:29,398 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1096980.0, ans=0.07 2023-11-20 13:03:36,292 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 8250, loss[loss=0.07637, simple_loss=0.0957, pruned_loss=0.0166, audio_tagging_loss=0.01192, over 14836.00 frames. ], tot_loss[loss=0.08008, simple_loss=0.1017, pruned_loss=0.01938, audio_tagging_loss=0.009843, over 3046823.80 frames. ], batch size: 58, lr: 4.90e-03, grad_scale: 8.0 2023-11-20 13:03:40,638 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.26 vs. limit=22.5 2023-11-20 13:03:53,009 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=1097113.3333333333, ans=10.0 2023-11-20 13:04:28,667 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 164600 2023-11-20 13:04:39,953 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.72 vs. limit=15.0 2023-11-20 13:04:40,209 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 8300, loss[loss=0.05298, simple_loss=0.05401, pruned_loss=0.01275, audio_tagging_loss=0.01322, over 14905.00 frames. ], tot_loss[loss=0.08, simple_loss=0.1013, pruned_loss=0.0195, audio_tagging_loss=0.009847, over 3049150.35 frames. ], batch size: 56, lr: 4.90e-03, grad_scale: 8.0 2023-11-20 13:04:50,878 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1097380.0, ans=0.125 2023-11-20 13:05:08,982 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.931e+01 8.247e+01 8.781e+01 9.736e+01 1.742e+02, threshold=1.756e+02, percent-clipped=0.0 2023-11-20 13:05:11,884 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1097513.3333333333, ans=0.125 2023-11-20 13:05:16,846 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1097513.3333333333, ans=0.125 2023-11-20 13:05:32,906 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 164650 2023-11-20 13:05:34,273 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1097646.6666666667, ans=0.0 2023-11-20 13:05:44,413 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 8350, loss[loss=0.0678, simple_loss=0.08965, pruned_loss=0.0105, audio_tagging_loss=0.01247, over 15528.00 frames. ], tot_loss[loss=0.07976, simple_loss=0.1013, pruned_loss=0.01936, audio_tagging_loss=0.009762, over 3056274.86 frames. ], batch size: 58, lr: 4.89e-03, grad_scale: 8.0 2023-11-20 13:05:50,137 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1097713.3333333333, ans=0.125 2023-11-20 13:06:13,188 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1097846.6666666667, ans=0.0 2023-11-20 13:06:28,551 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1097913.3333333333, ans=0.0 2023-11-20 13:06:30,983 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1097913.3333333333, ans=0.1 2023-11-20 13:06:36,769 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 164700 2023-11-20 13:06:38,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1097980.0, ans=0.125 2023-11-20 13:06:44,375 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1097980.0, ans=0.2 2023-11-20 13:06:46,862 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1097980.0, ans=0.125 2023-11-20 13:06:48,045 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1098046.6666666667, ans=0.2 2023-11-20 13:06:49,020 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 8400, loss[loss=0.06966, simple_loss=0.09194, pruned_loss=0.0124, audio_tagging_loss=0.01128, over 15336.00 frames. ], tot_loss[loss=0.07937, simple_loss=0.1009, pruned_loss=0.01918, audio_tagging_loss=0.009741, over 3057973.12 frames. ], batch size: 56, lr: 4.89e-03, grad_scale: 16.0 2023-11-20 13:07:10,965 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1098113.3333333333, ans=0.025 2023-11-20 13:07:17,342 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.406e+01 7.789e+01 8.682e+01 9.296e+01 1.321e+02, threshold=1.736e+02, percent-clipped=0.0 2023-11-20 13:07:17,719 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1098180.0, ans=0.0 2023-11-20 13:07:41,751 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 164750 2023-11-20 13:07:46,046 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.70 vs. limit=15.0 2023-11-20 13:07:53,341 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 8450, loss[loss=0.09629, simple_loss=0.1351, pruned_loss=0.02059, audio_tagging_loss=0.008127, over 16221.00 frames. ], tot_loss[loss=0.08006, simple_loss=0.102, pruned_loss=0.01936, audio_tagging_loss=0.009717, over 3050638.21 frames. ], batch size: 58, lr: 4.89e-03, grad_scale: 16.0 2023-11-20 13:08:12,710 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1098446.6666666667, ans=0.1 2023-11-20 13:08:13,940 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1098446.6666666667, ans=0.1 2023-11-20 13:08:33,600 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1098580.0, ans=0.125 2023-11-20 13:08:46,024 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 164800 2023-11-20 13:08:51,248 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1098646.6666666667, ans=0.125 2023-11-20 13:08:57,710 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 8500, loss[loss=0.06145, simple_loss=0.08796, pruned_loss=0.01195, audio_tagging_loss=0.005516, over 15154.00 frames. ], tot_loss[loss=0.08001, simple_loss=0.1015, pruned_loss=0.01947, audio_tagging_loss=0.009785, over 3052298.97 frames. ], batch size: 56, lr: 4.89e-03, grad_scale: 16.0 2023-11-20 13:09:03,260 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.32 vs. limit=22.5 2023-11-20 13:09:15,054 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.91 vs. limit=15.0 2023-11-20 13:09:25,202 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=1098846.6666666667, ans=15.0 2023-11-20 13:09:25,692 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.462e+01 8.485e+01 9.217e+01 9.923e+01 2.190e+02, threshold=1.843e+02, percent-clipped=1.0 2023-11-20 13:09:50,098 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 164850 2023-11-20 13:09:55,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1098980.0, ans=0.0 2023-11-20 13:10:02,336 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 8550, loss[loss=0.07056, simple_loss=0.09412, pruned_loss=0.01351, audio_tagging_loss=0.009986, over 15027.00 frames. ], tot_loss[loss=0.07996, simple_loss=0.1012, pruned_loss=0.01951, audio_tagging_loss=0.009844, over 3056131.20 frames. ], batch size: 58, lr: 4.89e-03, grad_scale: 16.0 2023-11-20 13:10:02,774 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1099046.6666666667, ans=0.0 2023-11-20 13:10:08,729 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1099046.6666666667, ans=0.2 2023-11-20 13:10:08,788 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1099046.6666666667, ans=0.1 2023-11-20 13:10:26,722 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.45 vs. limit=22.5 2023-11-20 13:10:55,035 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 164900 2023-11-20 13:11:03,215 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1099313.3333333333, ans=0.0 2023-11-20 13:11:06,508 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 8600, loss[loss=0.08089, simple_loss=0.1038, pruned_loss=0.02061, audio_tagging_loss=0.008406, over 15162.00 frames. ], tot_loss[loss=0.0797, simple_loss=0.1006, pruned_loss=0.01948, audio_tagging_loss=0.009906, over 3052084.14 frames. ], batch size: 56, lr: 4.89e-03, grad_scale: 16.0 2023-11-20 13:11:14,335 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.38 vs. limit=15.0 2023-11-20 13:11:24,427 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1099446.6666666667, ans=0.0 2023-11-20 13:11:28,054 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1099446.6666666667, ans=0.125 2023-11-20 13:11:30,517 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1099513.3333333333, ans=0.2 2023-11-20 13:11:33,751 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.330e+01 8.002e+01 8.794e+01 9.640e+01 1.342e+02, threshold=1.759e+02, percent-clipped=0.0 2023-11-20 13:11:50,157 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1099580.0, ans=0.125 2023-11-20 13:11:54,305 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1099580.0, ans=0.0 2023-11-20 13:11:58,949 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 164950 2023-11-20 13:12:05,279 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_na.min_abs, batch_count=1099646.6666666667, ans=0.02 2023-11-20 13:12:10,616 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 8650, loss[loss=0.09116, simple_loss=0.119, pruned_loss=0.02203, audio_tagging_loss=0.009616, over 15751.00 frames. ], tot_loss[loss=0.08052, simple_loss=0.1017, pruned_loss=0.01971, audio_tagging_loss=0.009966, over 3049210.62 frames. ], batch size: 55, lr: 4.89e-03, grad_scale: 16.0 2023-11-20 13:12:13,232 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1099713.3333333333, ans=0.125 2023-11-20 13:12:21,922 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1099780.0, ans=0.125 2023-11-20 13:12:27,493 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1099780.0, ans=0.0 2023-11-20 13:12:28,740 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1099780.0, ans=0.125 2023-11-20 13:12:52,623 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.84 vs. limit=10.0 2023-11-20 13:12:54,998 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.99 vs. limit=15.0 2023-11-20 13:12:59,630 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.95 vs. limit=6.0 2023-11-20 13:13:03,814 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 165000 2023-11-20 13:13:15,584 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 8700, loss[loss=0.09758, simple_loss=0.1269, pruned_loss=0.02693, audio_tagging_loss=0.00718, over 16070.00 frames. ], tot_loss[loss=0.08052, simple_loss=0.1015, pruned_loss=0.01976, audio_tagging_loss=0.01002, over 3054439.23 frames. ], batch size: 58, lr: 4.89e-03, grad_scale: 16.0 2023-11-20 13:13:15,897 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1100046.6666666667, ans=0.2 2023-11-20 13:13:17,025 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1100046.6666666667, ans=0.125 2023-11-20 13:13:23,876 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1100046.6666666667, ans=0.125 2023-11-20 13:13:33,244 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1100113.3333333333, ans=0.0 2023-11-20 13:13:44,007 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.858e+01 8.298e+01 8.856e+01 9.572e+01 1.265e+02, threshold=1.771e+02, percent-clipped=0.0 2023-11-20 13:14:07,417 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1100313.3333333333, ans=0.125 2023-11-20 13:14:08,457 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 165050 2023-11-20 13:14:15,769 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.27 vs. limit=15.0 2023-11-20 13:14:20,654 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 8750, loss[loss=0.09821, simple_loss=0.1353, pruned_loss=0.02294, audio_tagging_loss=0.007604, over 14240.00 frames. ], tot_loss[loss=0.07984, simple_loss=0.1008, pruned_loss=0.01939, audio_tagging_loss=0.01005, over 3052570.19 frames. ], batch size: 53, lr: 4.89e-03, grad_scale: 16.0 2023-11-20 13:14:24,999 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.61 vs. limit=15.0 2023-11-20 13:15:00,071 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1100580.0, ans=0.1 2023-11-20 13:15:13,124 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 165100 2023-11-20 13:15:24,010 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 8800, loss[loss=0.07712, simple_loss=0.09168, pruned_loss=0.01925, audio_tagging_loss=0.01203, over 15368.00 frames. ], tot_loss[loss=0.08062, simple_loss=0.1019, pruned_loss=0.01957, audio_tagging_loss=0.0101, over 3050056.49 frames. ], batch size: 58, lr: 4.89e-03, grad_scale: 32.0 2023-11-20 13:15:24,855 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.14 vs. limit=6.0 2023-11-20 13:15:25,669 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1100713.3333333333, ans=0.125 2023-11-20 13:15:46,322 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1100780.0, ans=0.125 2023-11-20 13:15:49,816 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1100846.6666666667, ans=0.0 2023-11-20 13:15:51,941 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.572e+01 8.177e+01 8.747e+01 9.582e+01 1.210e+02, threshold=1.749e+02, percent-clipped=0.0 2023-11-20 13:15:57,906 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1100846.6666666667, ans=0.125 2023-11-20 13:16:16,740 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 165150 2023-11-20 13:16:23,072 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1100980.0, ans=0.125 2023-11-20 13:16:27,609 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 8850, loss[loss=0.07814, simple_loss=0.1044, pruned_loss=0.01788, audio_tagging_loss=0.008082, over 15142.00 frames. ], tot_loss[loss=0.08004, simple_loss=0.1013, pruned_loss=0.01929, audio_tagging_loss=0.01007, over 3047734.24 frames. ], batch size: 61, lr: 4.89e-03, grad_scale: 32.0 2023-11-20 13:16:27,975 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1101046.6666666667, ans=0.125 2023-11-20 13:16:30,128 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1101046.6666666667, ans=0.125 2023-11-20 13:16:39,882 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.87 vs. limit=22.5 2023-11-20 13:16:40,511 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 13:16:47,979 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1101113.3333333333, ans=0.025 2023-11-20 13:17:05,791 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1101246.6666666667, ans=0.035 2023-11-20 13:17:21,026 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 165200 2023-11-20 13:17:32,333 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 8900, loss[loss=0.07466, simple_loss=0.09308, pruned_loss=0.017, audio_tagging_loss=0.01112, over 15461.00 frames. ], tot_loss[loss=0.08035, simple_loss=0.1021, pruned_loss=0.01942, audio_tagging_loss=0.009863, over 3058764.68 frames. ], batch size: 60, lr: 4.89e-03, grad_scale: 32.0 2023-11-20 13:17:48,581 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1101446.6666666667, ans=0.125 2023-11-20 13:17:58,977 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1101513.3333333333, ans=0.05 2023-11-20 13:18:00,912 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.758e+01 8.398e+01 8.901e+01 9.799e+01 1.599e+02, threshold=1.780e+02, percent-clipped=0.0 2023-11-20 13:18:25,671 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 165250 2023-11-20 13:18:27,568 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1101646.6666666667, ans=0.0 2023-11-20 13:18:30,170 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1101646.6666666667, ans=0.125 2023-11-20 13:18:33,735 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1101646.6666666667, ans=0.1 2023-11-20 13:18:37,247 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 8950, loss[loss=0.08588, simple_loss=0.1032, pruned_loss=0.02623, audio_tagging_loss=0.008049, over 14166.00 frames. ], tot_loss[loss=0.08058, simple_loss=0.1022, pruned_loss=0.0197, audio_tagging_loss=0.009762, over 3057287.39 frames. ], batch size: 56, lr: 4.89e-03, grad_scale: 32.0 2023-11-20 13:18:49,057 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1101780.0, ans=0.125 2023-11-20 13:18:50,189 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1101780.0, ans=0.125 2023-11-20 13:19:18,925 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1101913.3333333333, ans=0.125 2023-11-20 13:19:25,490 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.80 vs. limit=6.0 2023-11-20 13:19:29,853 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 165300 2023-11-20 13:19:35,721 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1101980.0, ans=0.1 2023-11-20 13:19:39,379 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1101980.0, ans=0.125 2023-11-20 13:19:41,535 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 9000, loss[loss=0.07493, simple_loss=0.09586, pruned_loss=0.01687, audio_tagging_loss=0.01013, over 16433.00 frames. ], tot_loss[loss=0.08091, simple_loss=0.1025, pruned_loss=0.01983, audio_tagging_loss=0.009807, over 3057202.50 frames. ], batch size: 64, lr: 4.88e-03, grad_scale: 32.0 2023-11-20 13:19:41,538 INFO [train_asr.py:1285] (0/4) Computing validation loss 2023-11-20 13:20:20,004 INFO [zipformer.py:1873] (0/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.4564, 3.6128, 2.5645, 3.7086], device='cuda:0') 2023-11-20 13:20:23,261 INFO [train_asr.py:1294] (0/4) Epoch 14, validation: loss=0.06237, simple_loss=0.05346, pruned_loss=0.005661, audio_tagging_loss=0.02998, over 4681554.00 frames. 2023-11-20 13:20:23,261 INFO [train_asr.py:1295] (0/4) Maximum memory allocated so far is 25925MB 2023-11-20 13:20:30,593 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.78 vs. limit=10.0 2023-11-20 13:20:35,329 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1102113.3333333333, ans=0.2 2023-11-20 13:20:37,589 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1102113.3333333333, ans=0.05 2023-11-20 13:20:50,973 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.532e+01 8.335e+01 8.996e+01 9.763e+01 1.376e+02, threshold=1.799e+02, percent-clipped=0.0 2023-11-20 13:21:01,678 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1102246.6666666667, ans=0.025 2023-11-20 13:21:05,328 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1102246.6666666667, ans=0.1 2023-11-20 13:21:16,418 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 165350 2023-11-20 13:21:20,268 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1102313.3333333333, ans=0.125 2023-11-20 13:21:27,337 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 9050, loss[loss=0.06712, simple_loss=0.07412, pruned_loss=0.0193, audio_tagging_loss=0.01076, over 14157.00 frames. ], tot_loss[loss=0.08037, simple_loss=0.1018, pruned_loss=0.01967, audio_tagging_loss=0.009787, over 3058643.92 frames. ], batch size: 54, lr: 4.88e-03, grad_scale: 32.0 2023-11-20 13:21:28,064 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=1102380.0, ans=15.0 2023-11-20 13:21:39,407 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1102446.6666666667, ans=0.1 2023-11-20 13:21:49,226 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.64 vs. limit=15.0 2023-11-20 13:21:53,551 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1102513.3333333333, ans=0.125 2023-11-20 13:22:00,891 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.42 vs. limit=22.5 2023-11-20 13:22:20,853 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 165400 2023-11-20 13:22:32,181 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 9100, loss[loss=0.09104, simple_loss=0.1176, pruned_loss=0.024, audio_tagging_loss=0.008249, over 15710.00 frames. ], tot_loss[loss=0.08069, simple_loss=0.1026, pruned_loss=0.01974, audio_tagging_loss=0.009665, over 3055624.18 frames. ], batch size: 57, lr: 4.88e-03, grad_scale: 32.0 2023-11-20 13:22:33,735 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1102713.3333333333, ans=0.0 2023-11-20 13:22:33,775 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1102713.3333333333, ans=0.125 2023-11-20 13:22:46,746 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1102780.0, ans=0.125 2023-11-20 13:22:49,661 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1102780.0, ans=0.1 2023-11-20 13:23:01,055 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.597e+01 8.047e+01 8.709e+01 9.562e+01 1.643e+02, threshold=1.742e+02, percent-clipped=0.0 2023-11-20 13:23:02,715 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1102846.6666666667, ans=0.07 2023-11-20 13:23:25,796 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 165450 2023-11-20 13:23:36,766 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 9150, loss[loss=0.08546, simple_loss=0.1198, pruned_loss=0.01873, audio_tagging_loss=0.006836, over 14487.00 frames. ], tot_loss[loss=0.08062, simple_loss=0.1025, pruned_loss=0.01972, audio_tagging_loss=0.009622, over 3058968.97 frames. ], batch size: 56, lr: 4.88e-03, grad_scale: 32.0 2023-11-20 13:24:02,382 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1103180.0, ans=0.0 2023-11-20 13:24:12,058 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1103180.0, ans=0.125 2023-11-20 13:24:19,277 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.59 vs. limit=15.0 2023-11-20 13:24:19,995 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1103246.6666666667, ans=0.0 2023-11-20 13:24:30,204 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 165500 2023-11-20 13:24:41,876 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 9200, loss[loss=0.07491, simple_loss=0.08804, pruned_loss=0.01607, audio_tagging_loss=0.01481, over 15232.00 frames. ], tot_loss[loss=0.08107, simple_loss=0.1029, pruned_loss=0.01992, audio_tagging_loss=0.00968, over 3056401.00 frames. ], batch size: 58, lr: 4.88e-03, grad_scale: 32.0 2023-11-20 13:24:57,350 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1103446.6666666667, ans=0.125 2023-11-20 13:25:01,054 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1103446.6666666667, ans=0.1 2023-11-20 13:25:07,745 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1103513.3333333333, ans=0.125 2023-11-20 13:25:10,032 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.657e+01 8.175e+01 8.950e+01 9.913e+01 1.287e+02, threshold=1.790e+02, percent-clipped=0.0 2023-11-20 13:25:32,386 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1103646.6666666667, ans=0.2 2023-11-20 13:25:34,585 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 165550 2023-11-20 13:25:41,667 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1103646.6666666667, ans=0.0 2023-11-20 13:25:44,584 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.15 vs. limit=15.0 2023-11-20 13:25:45,375 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1103713.3333333333, ans=0.1 2023-11-20 13:25:46,410 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 9250, loss[loss=0.07327, simple_loss=0.08418, pruned_loss=0.01947, audio_tagging_loss=0.01171, over 14768.00 frames. ], tot_loss[loss=0.08009, simple_loss=0.1015, pruned_loss=0.01961, audio_tagging_loss=0.009737, over 3057279.03 frames. ], batch size: 55, lr: 4.88e-03, grad_scale: 16.0 2023-11-20 13:25:56,852 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1103713.3333333333, ans=0.125 2023-11-20 13:25:58,049 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1103780.0, ans=0.125 2023-11-20 13:26:13,537 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1103846.6666666667, ans=0.0 2023-11-20 13:26:24,507 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1103913.3333333333, ans=0.125 2023-11-20 13:26:38,965 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 165600 2023-11-20 13:26:50,981 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 9300, loss[loss=0.08238, simple_loss=0.1102, pruned_loss=0.01933, audio_tagging_loss=0.007974, over 15388.00 frames. ], tot_loss[loss=0.07892, simple_loss=0.1003, pruned_loss=0.01907, audio_tagging_loss=0.009725, over 3059939.26 frames. ], batch size: 59, lr: 4.88e-03, grad_scale: 16.0 2023-11-20 13:26:58,324 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.65 vs. limit=10.0 2023-11-20 13:27:14,557 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.27 vs. limit=15.0 2023-11-20 13:27:15,433 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1104113.3333333333, ans=0.1 2023-11-20 13:27:21,186 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.442e+01 7.830e+01 8.462e+01 9.599e+01 1.223e+02, threshold=1.692e+02, percent-clipped=0.0 2023-11-20 13:27:33,032 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1104246.6666666667, ans=10.0 2023-11-20 13:27:42,817 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1104313.3333333333, ans=0.125 2023-11-20 13:27:44,404 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 165650 2023-11-20 13:27:46,225 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.41 vs. limit=15.0 2023-11-20 13:27:55,266 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 9350, loss[loss=0.08689, simple_loss=0.11, pruned_loss=0.0218, audio_tagging_loss=0.01008, over 14801.00 frames. ], tot_loss[loss=0.07901, simple_loss=0.1005, pruned_loss=0.01909, audio_tagging_loss=0.009682, over 3052886.37 frames. ], batch size: 56, lr: 4.88e-03, grad_scale: 16.0 2023-11-20 13:28:08,596 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.584e-03 2023-11-20 13:28:22,805 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1104513.3333333333, ans=0.2 2023-11-20 13:28:29,006 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1104513.3333333333, ans=0.0 2023-11-20 13:28:33,101 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1104580.0, ans=0.0 2023-11-20 13:28:36,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1104580.0, ans=0.2 2023-11-20 13:28:47,708 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 165700 2023-11-20 13:28:59,388 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.66 vs. limit=15.0 2023-11-20 13:28:59,943 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 9400, loss[loss=0.0691, simple_loss=0.08446, pruned_loss=0.01654, audio_tagging_loss=0.01033, over 15497.00 frames. ], tot_loss[loss=0.07906, simple_loss=0.1002, pruned_loss=0.01918, audio_tagging_loss=0.009794, over 3054478.91 frames. ], batch size: 59, lr: 4.88e-03, grad_scale: 16.0 2023-11-20 13:29:15,810 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=1104780.0, ans=0.025 2023-11-20 13:29:17,198 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.17 vs. limit=15.0 2023-11-20 13:29:18,004 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1104780.0, ans=0.0 2023-11-20 13:29:29,271 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.35 vs. limit=15.0 2023-11-20 13:29:29,611 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.579e+01 8.038e+01 8.701e+01 9.410e+01 1.188e+02, threshold=1.740e+02, percent-clipped=0.0 2023-11-20 13:29:30,275 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.65 vs. limit=15.0 2023-11-20 13:29:36,375 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.16 vs. limit=15.0 2023-11-20 13:29:41,625 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1104913.3333333333, ans=0.125 2023-11-20 13:29:46,027 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1104913.3333333333, ans=0.2 2023-11-20 13:29:52,923 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 165750 2023-11-20 13:30:02,338 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 13:30:02,560 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1104980.0, ans=0.125 2023-11-20 13:30:04,860 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 9450, loss[loss=0.06522, simple_loss=0.08779, pruned_loss=0.01064, audio_tagging_loss=0.01068, over 15188.00 frames. ], tot_loss[loss=0.07961, simple_loss=0.1011, pruned_loss=0.01921, audio_tagging_loss=0.009861, over 3055471.85 frames. ], batch size: 55, lr: 4.88e-03, grad_scale: 16.0 2023-11-20 13:30:57,434 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 165800 2023-11-20 13:31:05,375 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=1105313.3333333333, ans=10.0 2023-11-20 13:31:09,005 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 9500, loss[loss=0.09874, simple_loss=0.1316, pruned_loss=0.02561, audio_tagging_loss=0.007324, over 14875.00 frames. ], tot_loss[loss=0.07959, simple_loss=0.1008, pruned_loss=0.01922, audio_tagging_loss=0.009968, over 3051303.68 frames. ], batch size: 56, lr: 4.88e-03, grad_scale: 16.0 2023-11-20 13:31:09,222 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1105380.0, ans=0.125 2023-11-20 13:31:13,222 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1105380.0, ans=0.125 2023-11-20 13:31:38,221 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1105513.3333333333, ans=0.125 2023-11-20 13:31:39,026 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.072e+01 8.167e+01 8.714e+01 9.690e+01 1.183e+02, threshold=1.743e+02, percent-clipped=0.0 2023-11-20 13:31:52,383 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.27 vs. limit=10.0 2023-11-20 13:31:59,675 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.60 vs. limit=22.5 2023-11-20 13:32:01,614 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 165850 2023-11-20 13:32:12,201 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1105713.3333333333, ans=0.125 2023-11-20 13:32:12,211 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1105713.3333333333, ans=0.0 2023-11-20 13:32:13,247 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 9550, loss[loss=0.09708, simple_loss=0.1169, pruned_loss=0.02704, audio_tagging_loss=0.01156, over 14568.00 frames. ], tot_loss[loss=0.07969, simple_loss=0.1006, pruned_loss=0.01931, audio_tagging_loss=0.01009, over 3043477.98 frames. ], batch size: 54, lr: 4.88e-03, grad_scale: 16.0 2023-11-20 13:32:20,073 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=1105713.3333333333, ans=0.5 2023-11-20 13:32:44,499 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1105846.6666666667, ans=0.125 2023-11-20 13:32:49,652 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.47 vs. limit=22.5 2023-11-20 13:32:59,740 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1105913.3333333333, ans=0.125 2023-11-20 13:33:05,379 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1105980.0, ans=0.1 2023-11-20 13:33:06,213 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 165900 2023-11-20 13:33:17,799 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 9600, loss[loss=0.07049, simple_loss=0.08585, pruned_loss=0.01686, audio_tagging_loss=0.0107, over 14491.00 frames. ], tot_loss[loss=0.07958, simple_loss=0.1002, pruned_loss=0.01929, audio_tagging_loss=0.01019, over 3049643.34 frames. ], batch size: 55, lr: 4.88e-03, grad_scale: 32.0 2023-11-20 13:33:24,232 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1106046.6666666667, ans=0.2 2023-11-20 13:33:41,811 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1106180.0, ans=0.05 2023-11-20 13:33:45,026 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1106180.0, ans=0.0 2023-11-20 13:33:48,243 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.832e+01 8.385e+01 9.134e+01 1.022e+02 1.365e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-20 13:33:56,757 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.65 vs. limit=10.0 2023-11-20 13:33:58,783 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1106246.6666666667, ans=0.1 2023-11-20 13:34:10,113 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 165950 2023-11-20 13:34:12,842 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1106313.3333333333, ans=0.5 2023-11-20 13:34:15,125 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1106313.3333333333, ans=0.05 2023-11-20 13:34:18,089 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1106313.3333333333, ans=0.125 2023-11-20 13:34:21,509 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 9650, loss[loss=0.07086, simple_loss=0.09218, pruned_loss=0.01497, audio_tagging_loss=0.009806, over 15789.00 frames. ], tot_loss[loss=0.07904, simple_loss=0.09949, pruned_loss=0.01911, audio_tagging_loss=0.01019, over 3052846.99 frames. ], batch size: 59, lr: 4.88e-03, grad_scale: 16.0 2023-11-20 13:34:31,760 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1106380.0, ans=0.95 2023-11-20 13:34:52,849 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1106513.3333333333, ans=0.125 2023-11-20 13:34:55,633 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.28 vs. limit=15.0 2023-11-20 13:35:05,706 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1106580.0, ans=0.125 2023-11-20 13:35:05,777 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1106580.0, ans=0.125 2023-11-20 13:35:14,149 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 166000 2023-11-20 13:35:16,266 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.40 vs. limit=15.0 2023-11-20 13:35:25,859 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 9700, loss[loss=0.07895, simple_loss=0.1044, pruned_loss=0.01837, audio_tagging_loss=0.008397, over 15203.00 frames. ], tot_loss[loss=0.07856, simple_loss=0.09892, pruned_loss=0.019, audio_tagging_loss=0.0101, over 3047853.25 frames. ], batch size: 57, lr: 4.87e-03, grad_scale: 16.0 2023-11-20 13:35:26,048 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1106713.3333333333, ans=0.1 2023-11-20 13:35:36,714 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1106713.3333333333, ans=0.125 2023-11-20 13:35:50,655 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1106846.6666666667, ans=0.125 2023-11-20 13:35:57,241 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.187e+01 8.103e+01 9.034e+01 9.824e+01 1.276e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-20 13:36:11,729 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1106913.3333333333, ans=0.125 2023-11-20 13:36:18,841 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 166050 2023-11-20 13:36:22,749 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1106980.0, ans=0.2 2023-11-20 13:36:31,014 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 9750, loss[loss=0.0657, simple_loss=0.0801, pruned_loss=0.01195, audio_tagging_loss=0.0137, over 15085.00 frames. ], tot_loss[loss=0.07881, simple_loss=0.09943, pruned_loss=0.01914, audio_tagging_loss=0.00996, over 3045338.14 frames. ], batch size: 57, lr: 4.87e-03, grad_scale: 16.0 2023-11-20 13:36:42,409 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1107113.3333333333, ans=0.125 2023-11-20 13:37:02,008 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1107180.0, ans=0.125 2023-11-20 13:37:10,394 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.88 vs. limit=22.5 2023-11-20 13:37:12,239 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1107246.6666666667, ans=0.125 2023-11-20 13:37:22,160 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1107313.3333333333, ans=0.125 2023-11-20 13:37:24,257 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 166100 2023-11-20 13:37:35,955 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 9800, loss[loss=0.08441, simple_loss=0.1076, pruned_loss=0.01801, audio_tagging_loss=0.01259, over 15853.00 frames. ], tot_loss[loss=0.07835, simple_loss=0.09886, pruned_loss=0.01892, audio_tagging_loss=0.01001, over 3040936.17 frames. ], batch size: 58, lr: 4.87e-03, grad_scale: 8.0 2023-11-20 13:37:51,478 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1107446.6666666667, ans=0.125 2023-11-20 13:37:52,707 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1107446.6666666667, ans=0.07 2023-11-20 13:37:57,677 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1107446.6666666667, ans=0.0 2023-11-20 13:38:07,101 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1107513.3333333333, ans=0.0 2023-11-20 13:38:07,939 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.608e+01 8.297e+01 9.086e+01 9.730e+01 1.369e+02, threshold=1.817e+02, percent-clipped=0.0 2023-11-20 13:38:26,714 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1107646.6666666667, ans=0.2 2023-11-20 13:38:28,901 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 166150 2023-11-20 13:38:30,331 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1107646.6666666667, ans=0.2 2023-11-20 13:38:32,590 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 13:38:40,587 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 9850, loss[loss=0.07568, simple_loss=0.09119, pruned_loss=0.01821, audio_tagging_loss=0.01187, over 14726.00 frames. ], tot_loss[loss=0.07825, simple_loss=0.09874, pruned_loss=0.01897, audio_tagging_loss=0.009906, over 3042417.06 frames. ], batch size: 54, lr: 4.87e-03, grad_scale: 8.0 2023-11-20 13:38:51,769 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1107780.0, ans=0.0 2023-11-20 13:39:00,706 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=16.39 vs. limit=15.0 2023-11-20 13:39:10,290 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1107846.6666666667, ans=0.035 2023-11-20 13:39:13,361 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1107846.6666666667, ans=0.125 2023-11-20 13:39:19,734 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1107913.3333333333, ans=0.0 2023-11-20 13:39:20,945 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1107913.3333333333, ans=0.125 2023-11-20 13:39:23,385 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1107913.3333333333, ans=0.0 2023-11-20 13:39:33,727 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 166200 2023-11-20 13:39:45,710 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 9900, loss[loss=0.08136, simple_loss=0.1122, pruned_loss=0.01752, audio_tagging_loss=0.00776, over 15609.00 frames. ], tot_loss[loss=0.07893, simple_loss=0.1003, pruned_loss=0.01908, audio_tagging_loss=0.009681, over 3049831.76 frames. ], batch size: 59, lr: 4.87e-03, grad_scale: 8.0 2023-11-20 13:39:49,095 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.18 vs. limit=15.0 2023-11-20 13:39:56,709 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=1108046.6666666667, ans=0.05 2023-11-20 13:40:02,164 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1108113.3333333333, ans=0.0 2023-11-20 13:40:05,684 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1108113.3333333333, ans=0.125 2023-11-20 13:40:05,855 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1108113.3333333333, ans=0.125 2023-11-20 13:40:17,103 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.73 vs. limit=15.0 2023-11-20 13:40:18,347 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.742e+01 8.087e+01 8.695e+01 9.650e+01 1.416e+02, threshold=1.739e+02, percent-clipped=0.0 2023-11-20 13:40:21,044 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1108180.0, ans=0.125 2023-11-20 13:40:29,916 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1108246.6666666667, ans=0.2 2023-11-20 13:40:38,860 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 166250 2023-11-20 13:40:51,328 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 9950, loss[loss=0.07424, simple_loss=0.09906, pruned_loss=0.01676, audio_tagging_loss=0.007956, over 13886.00 frames. ], tot_loss[loss=0.07825, simple_loss=0.09958, pruned_loss=0.01879, audio_tagging_loss=0.009668, over 3050977.02 frames. ], batch size: 52, lr: 4.87e-03, grad_scale: 8.0 2023-11-20 13:41:29,691 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1108580.0, ans=0.125 2023-11-20 13:41:32,843 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1108580.0, ans=0.125 2023-11-20 13:41:44,167 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 166300 2023-11-20 13:41:44,470 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1108646.6666666667, ans=0.125 2023-11-20 13:41:55,020 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 10000, loss[loss=0.0868, simple_loss=0.1108, pruned_loss=0.0237, audio_tagging_loss=0.007725, over 15847.00 frames. ], tot_loss[loss=0.07815, simple_loss=0.09936, pruned_loss=0.01875, audio_tagging_loss=0.009728, over 3052419.60 frames. ], batch size: 58, lr: 4.87e-03, grad_scale: 8.0 2023-11-20 13:41:55,473 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1108713.3333333333, ans=0.125 2023-11-20 13:42:24,446 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1108846.6666666667, ans=0.125 2023-11-20 13:42:26,681 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1108846.6666666667, ans=0.1 2023-11-20 13:42:29,448 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.616e+01 8.105e+01 8.776e+01 9.451e+01 1.209e+02, threshold=1.755e+02, percent-clipped=0.0 2023-11-20 13:42:31,016 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1108846.6666666667, ans=0.125 2023-11-20 13:42:48,537 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 166350 2023-11-20 13:42:59,235 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 10050, loss[loss=0.07356, simple_loss=0.09604, pruned_loss=0.01519, audio_tagging_loss=0.01035, over 15086.00 frames. ], tot_loss[loss=0.07823, simple_loss=0.09927, pruned_loss=0.01887, audio_tagging_loss=0.009717, over 3050865.22 frames. ], batch size: 59, lr: 4.87e-03, grad_scale: 8.0 2023-11-20 13:43:03,810 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1109046.6666666667, ans=0.2 2023-11-20 13:43:08,718 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1109046.6666666667, ans=0.125 2023-11-20 13:43:22,150 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1109113.3333333333, ans=0.0 2023-11-20 13:43:32,882 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1109180.0, ans=0.125 2023-11-20 13:43:52,158 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 166400 2023-11-20 13:43:57,492 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1109313.3333333333, ans=0.0 2023-11-20 13:44:03,884 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 10100, loss[loss=0.09709, simple_loss=0.1163, pruned_loss=0.026, audio_tagging_loss=0.01293, over 15741.00 frames. ], tot_loss[loss=0.07878, simple_loss=0.09977, pruned_loss=0.01913, audio_tagging_loss=0.009767, over 3049091.77 frames. ], batch size: 58, lr: 4.87e-03, grad_scale: 8.0 2023-11-20 13:44:22,633 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.52 vs. limit=22.5 2023-11-20 13:44:37,153 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.486e+01 8.086e+01 8.697e+01 9.512e+01 1.226e+02, threshold=1.739e+02, percent-clipped=0.0 2023-11-20 13:44:44,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1109580.0, ans=0.0 2023-11-20 13:44:46,761 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1109580.0, ans=0.125 2023-11-20 13:44:52,128 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.59 vs. limit=15.0 2023-11-20 13:44:56,181 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 13:44:57,529 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 166450 2023-11-20 13:45:02,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1109646.6666666667, ans=0.1 2023-11-20 13:45:03,639 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1109646.6666666667, ans=0.125 2023-11-20 13:45:08,361 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 10150, loss[loss=0.09032, simple_loss=0.1115, pruned_loss=0.02324, audio_tagging_loss=0.01135, over 16242.00 frames. ], tot_loss[loss=0.07937, simple_loss=0.1003, pruned_loss=0.01937, audio_tagging_loss=0.009843, over 3050137.59 frames. ], batch size: 59, lr: 4.87e-03, grad_scale: 8.0 2023-11-20 13:45:23,816 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1109780.0, ans=0.0 2023-11-20 13:45:37,518 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 13:45:43,442 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1109846.6666666667, ans=0.1 2023-11-20 13:45:46,425 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1109913.3333333333, ans=0.125 2023-11-20 13:45:48,149 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.77 vs. limit=22.5 2023-11-20 13:45:59,796 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1109980.0, ans=0.1 2023-11-20 13:45:59,800 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1109980.0, ans=0.1 2023-11-20 13:46:00,876 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 166500 2023-11-20 13:46:05,231 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1109980.0, ans=0.125 2023-11-20 13:46:12,418 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 10200, loss[loss=0.07248, simple_loss=0.09432, pruned_loss=0.01713, audio_tagging_loss=0.00819, over 14963.00 frames. ], tot_loss[loss=0.07979, simple_loss=0.1006, pruned_loss=0.01946, audio_tagging_loss=0.01002, over 3052556.63 frames. ], batch size: 56, lr: 4.87e-03, grad_scale: 8.0 2023-11-20 13:46:23,152 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1110046.6666666667, ans=0.125 2023-11-20 13:46:28,151 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1110113.3333333333, ans=0.125 2023-11-20 13:46:36,408 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 13:46:38,021 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1110180.0, ans=0.125 2023-11-20 13:46:46,112 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.930e+01 8.131e+01 8.850e+01 9.665e+01 1.277e+02, threshold=1.770e+02, percent-clipped=0.0 2023-11-20 13:46:56,599 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1110246.6666666667, ans=0.0 2023-11-20 13:46:56,759 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1110246.6666666667, ans=0.0 2023-11-20 13:47:05,116 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 166550 2023-11-20 13:47:15,669 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1110380.0, ans=0.05 2023-11-20 13:47:16,464 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 10250, loss[loss=0.1022, simple_loss=0.1285, pruned_loss=0.02731, audio_tagging_loss=0.01061, over 14962.00 frames. ], tot_loss[loss=0.07998, simple_loss=0.101, pruned_loss=0.01942, audio_tagging_loss=0.01007, over 3057234.21 frames. ], batch size: 56, lr: 4.87e-03, grad_scale: 8.0 2023-11-20 13:47:16,697 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1110380.0, ans=0.1 2023-11-20 13:47:37,874 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1110446.6666666667, ans=0.125 2023-11-20 13:47:47,442 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.79 vs. limit=6.0 2023-11-20 13:47:54,342 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1110580.0, ans=0.0 2023-11-20 13:48:09,969 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 166600 2023-11-20 13:48:18,689 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1110646.6666666667, ans=0.0 2023-11-20 13:48:21,980 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 10300, loss[loss=0.07058, simple_loss=0.08895, pruned_loss=0.01597, audio_tagging_loss=0.01013, over 15655.00 frames. ], tot_loss[loss=0.07921, simple_loss=0.09964, pruned_loss=0.01922, audio_tagging_loss=0.01017, over 3057026.08 frames. ], batch size: 59, lr: 4.87e-03, grad_scale: 8.0 2023-11-20 13:48:52,046 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1110846.6666666667, ans=0.0 2023-11-20 13:48:54,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1110846.6666666667, ans=0.125 2023-11-20 13:48:55,307 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.311e+01 8.084e+01 8.693e+01 9.702e+01 1.335e+02, threshold=1.739e+02, percent-clipped=0.0 2023-11-20 13:49:15,247 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 166650 2023-11-20 13:49:24,614 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=1110980.0, ans=0.05 2023-11-20 13:49:25,859 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1111046.6666666667, ans=0.025 2023-11-20 13:49:26,760 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 10350, loss[loss=0.08306, simple_loss=0.1013, pruned_loss=0.01992, audio_tagging_loss=0.01247, over 14598.00 frames. ], tot_loss[loss=0.07926, simple_loss=0.09952, pruned_loss=0.01916, audio_tagging_loss=0.01034, over 3048305.68 frames. ], batch size: 54, lr: 4.86e-03, grad_scale: 8.0 2023-11-20 13:49:37,352 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1111046.6666666667, ans=0.2 2023-11-20 13:49:38,857 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.91 vs. limit=15.0 2023-11-20 13:49:42,319 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1111113.3333333333, ans=0.125 2023-11-20 13:49:43,419 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1111113.3333333333, ans=0.125 2023-11-20 13:49:47,073 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1111113.3333333333, ans=0.125 2023-11-20 13:49:49,537 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.06 vs. limit=15.0 2023-11-20 13:49:50,049 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1111113.3333333333, ans=0.125 2023-11-20 13:49:56,794 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1111180.0, ans=0.125 2023-11-20 13:49:59,449 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1111180.0, ans=0.125 2023-11-20 13:50:05,483 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1111246.6666666667, ans=0.125 2023-11-20 13:50:11,101 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1111246.6666666667, ans=0.125 2023-11-20 13:50:14,246 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.63 vs. limit=10.0 2023-11-20 13:50:19,516 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 166700 2023-11-20 13:50:22,120 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 13:50:25,658 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.28 vs. limit=15.0 2023-11-20 13:50:31,143 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 10400, loss[loss=0.06813, simple_loss=0.08696, pruned_loss=0.01702, audio_tagging_loss=0.007627, over 14165.00 frames. ], tot_loss[loss=0.07854, simple_loss=0.09848, pruned_loss=0.01891, audio_tagging_loss=0.01039, over 3047269.69 frames. ], batch size: 53, lr: 4.86e-03, grad_scale: 16.0 2023-11-20 13:50:36,519 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1111380.0, ans=0.07 2023-11-20 13:50:38,322 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1111380.0, ans=0.0 2023-11-20 13:50:38,376 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1111380.0, ans=0.125 2023-11-20 13:50:42,080 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1111380.0, ans=0.125 2023-11-20 13:50:46,658 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.21 vs. limit=10.0 2023-11-20 13:50:47,421 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1111446.6666666667, ans=0.0 2023-11-20 13:51:00,750 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.15 vs. limit=10.0 2023-11-20 13:51:03,382 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.15 vs. limit=12.0 2023-11-20 13:51:05,059 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.857e+01 8.019e+01 8.655e+01 9.452e+01 1.304e+02, threshold=1.731e+02, percent-clipped=0.0 2023-11-20 13:51:10,496 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff3.min_abs, batch_count=1111580.0, ans=0.2 2023-11-20 13:51:24,401 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 166750 2023-11-20 13:51:36,021 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 10450, loss[loss=0.09695, simple_loss=0.1182, pruned_loss=0.02921, audio_tagging_loss=0.00866, over 15293.00 frames. ], tot_loss[loss=0.07984, simple_loss=0.1002, pruned_loss=0.01959, audio_tagging_loss=0.01017, over 3052316.11 frames. ], batch size: 54, lr: 4.86e-03, grad_scale: 16.0 2023-11-20 13:52:21,085 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.99 vs. limit=12.0 2023-11-20 13:52:29,657 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 166800 2023-11-20 13:52:41,522 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 10500, loss[loss=0.06838, simple_loss=0.0805, pruned_loss=0.01555, audio_tagging_loss=0.01258, over 15159.00 frames. ], tot_loss[loss=0.07924, simple_loss=0.09967, pruned_loss=0.01929, audio_tagging_loss=0.01012, over 3048607.99 frames. ], batch size: 56, lr: 4.86e-03, grad_scale: 16.0 2023-11-20 13:52:45,410 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1112046.6666666667, ans=0.125 2023-11-20 13:53:14,862 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.848e+01 8.112e+01 8.724e+01 9.287e+01 1.188e+02, threshold=1.745e+02, percent-clipped=0.0 2023-11-20 13:53:34,588 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 166850 2023-11-20 13:53:40,735 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1112313.3333333333, ans=0.125 2023-11-20 13:53:45,955 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 10550, loss[loss=0.09913, simple_loss=0.1328, pruned_loss=0.02407, audio_tagging_loss=0.008671, over 15590.00 frames. ], tot_loss[loss=0.07944, simple_loss=0.1001, pruned_loss=0.01942, audio_tagging_loss=0.009986, over 3050444.70 frames. ], batch size: 57, lr: 4.86e-03, grad_scale: 16.0 2023-11-20 13:54:18,704 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1112513.3333333333, ans=0.125 2023-11-20 13:54:26,243 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1112580.0, ans=0.125 2023-11-20 13:54:27,238 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1112580.0, ans=0.0 2023-11-20 13:54:31,402 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1112580.0, ans=0.125 2023-11-20 13:54:36,543 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1112646.6666666667, ans=0.125 2023-11-20 13:54:38,999 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 166900 2023-11-20 13:54:43,531 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.97 vs. limit=22.5 2023-11-20 13:54:45,983 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1112646.6666666667, ans=0.025 2023-11-20 13:54:49,828 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1112713.3333333333, ans=0.0 2023-11-20 13:54:50,576 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 10600, loss[loss=0.06187, simple_loss=0.07795, pruned_loss=0.01438, audio_tagging_loss=0.008517, over 16154.00 frames. ], tot_loss[loss=0.07922, simple_loss=0.09967, pruned_loss=0.01938, audio_tagging_loss=0.01001, over 3049451.66 frames. ], batch size: 60, lr: 4.86e-03, grad_scale: 16.0 2023-11-20 13:55:03,200 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.40 vs. limit=12.0 2023-11-20 13:55:06,619 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1112780.0, ans=0.125 2023-11-20 13:55:14,111 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1112780.0, ans=0.2 2023-11-20 13:55:16,561 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1112846.6666666667, ans=0.0 2023-11-20 13:55:24,051 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.162e+01 8.206e+01 8.903e+01 9.867e+01 1.464e+02, threshold=1.781e+02, percent-clipped=0.0 2023-11-20 13:55:43,402 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 166950 2023-11-20 13:55:55,874 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 10650, loss[loss=0.08174, simple_loss=0.1018, pruned_loss=0.02029, audio_tagging_loss=0.01054, over 14209.00 frames. ], tot_loss[loss=0.07963, simple_loss=0.1002, pruned_loss=0.01959, audio_tagging_loss=0.009924, over 3046735.82 frames. ], batch size: 54, lr: 4.86e-03, grad_scale: 16.0 2023-11-20 13:55:57,507 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1113046.6666666667, ans=0.2 2023-11-20 13:56:05,756 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1113046.6666666667, ans=0.1 2023-11-20 13:56:12,529 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1113113.3333333333, ans=0.2 2023-11-20 13:56:32,548 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1113246.6666666667, ans=0.0 2023-11-20 13:56:44,419 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.43 vs. limit=22.5 2023-11-20 13:56:48,715 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 167000 2023-11-20 13:56:50,405 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1113313.3333333333, ans=0.0 2023-11-20 13:57:00,546 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 10700, loss[loss=0.08016, simple_loss=0.1057, pruned_loss=0.0188, audio_tagging_loss=0.008495, over 15133.00 frames. ], tot_loss[loss=0.07953, simple_loss=0.1003, pruned_loss=0.01949, audio_tagging_loss=0.009905, over 3055573.59 frames. ], batch size: 60, lr: 4.86e-03, grad_scale: 16.0 2023-11-20 13:57:04,477 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1113380.0, ans=0.1 2023-11-20 13:57:30,656 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.30 vs. limit=22.5 2023-11-20 13:57:34,215 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.075e+01 8.061e+01 8.803e+01 9.456e+01 1.141e+02, threshold=1.761e+02, percent-clipped=0.0 2023-11-20 13:57:40,956 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1113580.0, ans=0.125 2023-11-20 13:57:43,506 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1113580.0, ans=0.125 2023-11-20 13:57:53,741 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 167050 2023-11-20 13:58:05,328 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 10750, loss[loss=0.07642, simple_loss=0.09817, pruned_loss=0.01725, audio_tagging_loss=0.01008, over 16438.00 frames. ], tot_loss[loss=0.07901, simple_loss=0.09969, pruned_loss=0.01924, audio_tagging_loss=0.009931, over 3055132.32 frames. ], batch size: 61, lr: 4.86e-03, grad_scale: 16.0 2023-11-20 13:58:14,296 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1113713.3333333333, ans=0.0 2023-11-20 13:58:15,676 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.84 vs. limit=15.0 2023-11-20 13:58:16,566 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1113780.0, ans=0.125 2023-11-20 13:58:22,148 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1113780.0, ans=0.125 2023-11-20 13:58:36,217 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=4.835e-01 2023-11-20 13:58:47,740 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1113913.3333333333, ans=0.0 2023-11-20 13:58:57,888 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 167100 2023-11-20 13:59:01,795 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1113980.0, ans=0.125 2023-11-20 13:59:09,692 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 10800, loss[loss=0.06481, simple_loss=0.07615, pruned_loss=0.01562, audio_tagging_loss=0.01112, over 14069.00 frames. ], tot_loss[loss=0.07808, simple_loss=0.09846, pruned_loss=0.01894, audio_tagging_loss=0.009907, over 3049067.59 frames. ], batch size: 56, lr: 4.86e-03, grad_scale: 32.0 2023-11-20 13:59:11,366 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1114046.6666666667, ans=0.125 2023-11-20 13:59:43,616 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.204e+01 8.350e+01 8.974e+01 9.650e+01 1.251e+02, threshold=1.795e+02, percent-clipped=0.0 2023-11-20 13:59:58,157 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1114246.6666666667, ans=0.2 2023-11-20 14:00:03,061 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 167150 2023-11-20 14:00:14,909 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 10850, loss[loss=0.09977, simple_loss=0.1379, pruned_loss=0.0232, audio_tagging_loss=0.007616, over 15417.00 frames. ], tot_loss[loss=0.07845, simple_loss=0.09894, pruned_loss=0.01905, audio_tagging_loss=0.009931, over 3046120.23 frames. ], batch size: 56, lr: 4.86e-03, grad_scale: 32.0 2023-11-20 14:00:23,793 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.68 vs. limit=15.0 2023-11-20 14:00:41,088 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1114513.3333333333, ans=0.0 2023-11-20 14:00:48,868 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=1114513.3333333333, ans=0.05 2023-11-20 14:01:08,159 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 167200 2023-11-20 14:01:10,060 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1114646.6666666667, ans=0.09899494936611666 2023-11-20 14:01:14,734 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 14:01:20,242 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 10900, loss[loss=0.09441, simple_loss=0.1238, pruned_loss=0.02358, audio_tagging_loss=0.008904, over 15214.00 frames. ], tot_loss[loss=0.0792, simple_loss=0.1003, pruned_loss=0.01918, audio_tagging_loss=0.0099, over 3049814.52 frames. ], batch size: 56, lr: 4.86e-03, grad_scale: 32.0 2023-11-20 14:01:23,055 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1114713.3333333333, ans=0.125 2023-11-20 14:01:37,213 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1114780.0, ans=0.0 2023-11-20 14:01:45,803 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1114846.6666666667, ans=0.1 2023-11-20 14:01:47,723 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1114846.6666666667, ans=0.125 2023-11-20 14:01:50,127 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1114846.6666666667, ans=0.125 2023-11-20 14:01:53,714 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.172e+01 8.152e+01 8.794e+01 9.597e+01 1.232e+02, threshold=1.759e+02, percent-clipped=0.0 2023-11-20 14:02:11,150 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1114980.0, ans=0.0 2023-11-20 14:02:13,341 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 167250 2023-11-20 14:02:20,238 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.50 vs. limit=15.0 2023-11-20 14:02:24,259 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 10950, loss[loss=0.08824, simple_loss=0.1029, pruned_loss=0.02674, audio_tagging_loss=0.01004, over 15270.00 frames. ], tot_loss[loss=0.07942, simple_loss=0.1004, pruned_loss=0.01936, audio_tagging_loss=0.009882, over 3050580.25 frames. ], batch size: 58, lr: 4.86e-03, grad_scale: 32.0 2023-11-20 14:02:32,338 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1115046.6666666667, ans=0.95 2023-11-20 14:02:48,442 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1115113.3333333333, ans=0.125 2023-11-20 14:02:51,128 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.99 vs. limit=15.0 2023-11-20 14:03:17,754 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 167300 2023-11-20 14:03:19,294 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1115313.3333333333, ans=0.0 2023-11-20 14:03:25,702 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.03 vs. limit=6.0 2023-11-20 14:03:29,246 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 11000, loss[loss=0.09604, simple_loss=0.1166, pruned_loss=0.02841, audio_tagging_loss=0.009334, over 14509.00 frames. ], tot_loss[loss=0.07948, simple_loss=0.1005, pruned_loss=0.01928, audio_tagging_loss=0.009963, over 3047133.12 frames. ], batch size: 53, lr: 4.86e-03, grad_scale: 32.0 2023-11-20 14:03:38,496 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 14:03:42,462 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1115446.6666666667, ans=0.125 2023-11-20 14:03:43,885 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.81 vs. limit=15.0 2023-11-20 14:04:02,036 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.939e+01 8.121e+01 8.892e+01 9.815e+01 1.453e+02, threshold=1.778e+02, percent-clipped=0.0 2023-11-20 14:04:22,145 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 167350 2023-11-20 14:04:33,139 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 11050, loss[loss=0.09392, simple_loss=0.1084, pruned_loss=0.02695, audio_tagging_loss=0.01277, over 15173.00 frames. ], tot_loss[loss=0.07896, simple_loss=0.09973, pruned_loss=0.01901, audio_tagging_loss=0.01009, over 3047114.84 frames. ], batch size: 59, lr: 4.85e-03, grad_scale: 32.0 2023-11-20 14:04:45,220 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1115780.0, ans=0.125 2023-11-20 14:04:53,281 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1115780.0, ans=0.125 2023-11-20 14:04:54,916 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.83 vs. limit=22.5 2023-11-20 14:05:00,448 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1115846.6666666667, ans=0.1 2023-11-20 14:05:01,772 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1115846.6666666667, ans=0.125 2023-11-20 14:05:15,772 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.06 vs. limit=15.0 2023-11-20 14:05:24,539 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1115980.0, ans=0.2 2023-11-20 14:05:25,706 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1115980.0, ans=0.1 2023-11-20 14:05:26,813 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 167400 2023-11-20 14:05:32,847 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.67 vs. limit=15.0 2023-11-20 14:05:38,017 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 11100, loss[loss=0.09241, simple_loss=0.1173, pruned_loss=0.02399, audio_tagging_loss=0.009744, over 15797.00 frames. ], tot_loss[loss=0.07907, simple_loss=0.09963, pruned_loss=0.01899, audio_tagging_loss=0.01026, over 3047496.56 frames. ], batch size: 57, lr: 4.85e-03, grad_scale: 32.0 2023-11-20 14:06:11,984 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.063e+01 8.381e+01 8.919e+01 9.708e+01 1.297e+02, threshold=1.784e+02, percent-clipped=0.0 2023-11-20 14:06:16,610 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1116246.6666666667, ans=0.0 2023-11-20 14:06:20,320 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1116246.6666666667, ans=0.125 2023-11-20 14:06:21,646 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 14:06:22,717 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1116246.6666666667, ans=0.0 2023-11-20 14:06:31,721 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 167450 2023-11-20 14:06:42,801 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 11150, loss[loss=0.09147, simple_loss=0.1165, pruned_loss=0.02535, audio_tagging_loss=0.007893, over 15591.00 frames. ], tot_loss[loss=0.07907, simple_loss=0.09948, pruned_loss=0.01894, audio_tagging_loss=0.01039, over 3048650.83 frames. ], batch size: 56, lr: 4.85e-03, grad_scale: 32.0 2023-11-20 14:06:46,667 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1116380.0, ans=0.125 2023-11-20 14:06:47,442 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.52 vs. limit=15.0 2023-11-20 14:06:58,281 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.95 vs. limit=22.5 2023-11-20 14:07:03,204 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1116446.6666666667, ans=0.125 2023-11-20 14:07:06,926 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1116446.6666666667, ans=0.125 2023-11-20 14:07:31,957 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1116580.0, ans=0.125 2023-11-20 14:07:35,340 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 167500 2023-11-20 14:07:47,508 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 11200, loss[loss=0.06739, simple_loss=0.08373, pruned_loss=0.01435, audio_tagging_loss=0.01118, over 14614.00 frames. ], tot_loss[loss=0.07839, simple_loss=0.0986, pruned_loss=0.01868, audio_tagging_loss=0.01041, over 3047725.93 frames. ], batch size: 56, lr: 4.85e-03, grad_scale: 32.0 2023-11-20 14:08:07,798 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1116780.0, ans=0.04949747468305833 2023-11-20 14:08:19,396 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1116846.6666666667, ans=0.125 2023-11-20 14:08:20,246 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.974e+01 8.196e+01 8.773e+01 9.585e+01 1.271e+02, threshold=1.755e+02, percent-clipped=0.0 2023-11-20 14:08:25,366 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1116913.3333333333, ans=0.07 2023-11-20 14:08:40,473 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 167550 2023-11-20 14:08:42,036 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1116980.0, ans=0.125 2023-11-20 14:08:45,560 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1116980.0, ans=0.125 2023-11-20 14:08:51,244 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 11250, loss[loss=0.0721, simple_loss=0.09713, pruned_loss=0.01438, audio_tagging_loss=0.009157, over 14480.00 frames. ], tot_loss[loss=0.07809, simple_loss=0.09839, pruned_loss=0.01853, audio_tagging_loss=0.01036, over 3052867.48 frames. ], batch size: 55, lr: 4.85e-03, grad_scale: 16.0 2023-11-20 14:08:56,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1117046.6666666667, ans=0.125 2023-11-20 14:09:16,029 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1117180.0, ans=0.125 2023-11-20 14:09:44,049 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 167600 2023-11-20 14:09:51,120 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1117313.3333333333, ans=0.125 2023-11-20 14:09:53,527 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1117313.3333333333, ans=0.0 2023-11-20 14:09:55,737 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 11300, loss[loss=0.08138, simple_loss=0.1071, pruned_loss=0.01963, audio_tagging_loss=0.008208, over 15937.00 frames. ], tot_loss[loss=0.0776, simple_loss=0.09815, pruned_loss=0.01841, audio_tagging_loss=0.01011, over 3053162.56 frames. ], batch size: 57, lr: 4.85e-03, grad_scale: 16.0 2023-11-20 14:10:27,597 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1117513.3333333333, ans=0.125 2023-11-20 14:10:30,802 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.679e+01 8.103e+01 8.654e+01 9.341e+01 1.359e+02, threshold=1.731e+02, percent-clipped=0.0 2023-11-20 14:10:42,917 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1117580.0, ans=0.0 2023-11-20 14:10:48,676 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 167650 2023-11-20 14:11:00,300 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 11350, loss[loss=0.04751, simple_loss=0.05663, pruned_loss=0.008993, audio_tagging_loss=0.0102, over 14414.00 frames. ], tot_loss[loss=0.07793, simple_loss=0.09855, pruned_loss=0.01862, audio_tagging_loss=0.01003, over 3057029.48 frames. ], batch size: 58, lr: 4.85e-03, grad_scale: 16.0 2023-11-20 14:11:00,591 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1117713.3333333333, ans=0.125 2023-11-20 14:11:01,772 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1117713.3333333333, ans=0.1 2023-11-20 14:11:04,971 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1117713.3333333333, ans=0.125 2023-11-20 14:11:13,381 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.96 vs. limit=15.0 2023-11-20 14:11:40,095 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1117913.3333333333, ans=0.125 2023-11-20 14:11:49,693 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.27 vs. limit=22.5 2023-11-20 14:11:52,952 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 167700 2023-11-20 14:12:04,777 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 11400, loss[loss=0.1228, simple_loss=0.1583, pruned_loss=0.03403, audio_tagging_loss=0.009566, over 14922.00 frames. ], tot_loss[loss=0.07797, simple_loss=0.09865, pruned_loss=0.01872, audio_tagging_loss=0.009934, over 3052831.99 frames. ], batch size: 55, lr: 4.85e-03, grad_scale: 16.0 2023-11-20 14:12:08,754 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1118046.6666666667, ans=0.1 2023-11-20 14:12:09,896 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1118046.6666666667, ans=0.035 2023-11-20 14:12:33,723 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1118180.0, ans=0.2 2023-11-20 14:12:39,409 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.780e+01 8.090e+01 8.832e+01 9.724e+01 2.021e+02, threshold=1.766e+02, percent-clipped=1.0 2023-11-20 14:12:42,144 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff2.min_abs, batch_count=1118246.6666666667, ans=0.1 2023-11-20 14:12:55,589 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1118313.3333333333, ans=0.1 2023-11-20 14:12:57,885 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 167750 2023-11-20 14:13:09,456 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 11450, loss[loss=0.05012, simple_loss=0.05712, pruned_loss=0.01126, audio_tagging_loss=0.0103, over 15050.00 frames. ], tot_loss[loss=0.07848, simple_loss=0.09928, pruned_loss=0.01896, audio_tagging_loss=0.009885, over 3042256.09 frames. ], batch size: 60, lr: 4.85e-03, grad_scale: 16.0 2023-11-20 14:13:12,119 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1118380.0, ans=0.2 2023-11-20 14:13:34,086 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.21 vs. limit=15.0 2023-11-20 14:14:02,127 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 167800 2023-11-20 14:14:14,030 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 11500, loss[loss=0.08754, simple_loss=0.1181, pruned_loss=0.02285, audio_tagging_loss=0.00564, over 15241.00 frames. ], tot_loss[loss=0.07862, simple_loss=0.09931, pruned_loss=0.01898, audio_tagging_loss=0.009991, over 3042371.52 frames. ], batch size: 56, lr: 4.85e-03, grad_scale: 16.0 2023-11-20 14:14:14,456 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1118713.3333333333, ans=0.125 2023-11-20 14:14:21,502 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1118713.3333333333, ans=0.125 2023-11-20 14:14:22,840 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1118713.3333333333, ans=0.1 2023-11-20 14:14:24,470 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 14:14:24,516 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1118713.3333333333, ans=0.0 2023-11-20 14:14:40,042 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1118846.6666666667, ans=0.2 2023-11-20 14:14:45,827 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.57 vs. limit=10.0 2023-11-20 14:14:48,904 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.104e+01 8.299e+01 8.769e+01 9.853e+01 1.208e+02, threshold=1.754e+02, percent-clipped=0.0 2023-11-20 14:14:55,440 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1118913.3333333333, ans=0.125 2023-11-20 14:14:55,492 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1118913.3333333333, ans=0.125 2023-11-20 14:14:55,949 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.16 vs. limit=15.0 2023-11-20 14:15:04,982 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1118980.0, ans=0.125 2023-11-20 14:15:07,077 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 167850 2023-11-20 14:15:07,225 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1118980.0, ans=0.07 2023-11-20 14:15:18,306 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1119046.6666666667, ans=0.0 2023-11-20 14:15:19,082 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 11550, loss[loss=0.07277, simple_loss=0.09769, pruned_loss=0.01471, audio_tagging_loss=0.009217, over 15150.00 frames. ], tot_loss[loss=0.07832, simple_loss=0.09894, pruned_loss=0.01883, audio_tagging_loss=0.01002, over 3048537.93 frames. ], batch size: 57, lr: 4.85e-03, grad_scale: 16.0 2023-11-20 14:15:46,234 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.36 vs. limit=15.0 2023-11-20 14:15:55,716 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 14:15:57,223 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1119246.6666666667, ans=0.125 2023-11-20 14:16:11,721 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 167900 2023-11-20 14:16:14,299 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1119313.3333333333, ans=0.125 2023-11-20 14:16:16,817 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1119313.3333333333, ans=0.125 2023-11-20 14:16:18,016 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1119313.3333333333, ans=0.125 2023-11-20 14:16:23,303 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 11600, loss[loss=0.06886, simple_loss=0.0904, pruned_loss=0.01628, audio_tagging_loss=0.007378, over 15507.00 frames. ], tot_loss[loss=0.07817, simple_loss=0.09906, pruned_loss=0.0187, audio_tagging_loss=0.009938, over 3047806.49 frames. ], batch size: 56, lr: 4.85e-03, grad_scale: 32.0 2023-11-20 14:16:44,902 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1119446.6666666667, ans=0.125 2023-11-20 14:16:48,696 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1119513.3333333333, ans=0.125 2023-11-20 14:16:55,861 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.55 vs. limit=15.0 2023-11-20 14:16:57,522 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.656e+01 8.039e+01 8.649e+01 9.262e+01 1.367e+02, threshold=1.730e+02, percent-clipped=0.0 2023-11-20 14:17:14,181 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.81 vs. limit=6.0 2023-11-20 14:17:15,917 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 167950 2023-11-20 14:17:23,621 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1119646.6666666667, ans=0.2 2023-11-20 14:17:25,119 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.89 vs. limit=6.0 2023-11-20 14:17:26,973 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 11650, loss[loss=0.0737, simple_loss=0.09075, pruned_loss=0.01903, audio_tagging_loss=0.009297, over 15549.00 frames. ], tot_loss[loss=0.07758, simple_loss=0.0984, pruned_loss=0.01847, audio_tagging_loss=0.009908, over 3046915.10 frames. ], batch size: 56, lr: 4.85e-03, grad_scale: 32.0 2023-11-20 14:17:37,528 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1119713.3333333333, ans=0.125 2023-11-20 14:17:44,046 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1119780.0, ans=0.1 2023-11-20 14:17:45,302 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1119780.0, ans=0.0 2023-11-20 14:17:45,330 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1119780.0, ans=0.1 2023-11-20 14:18:12,387 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1119913.3333333333, ans=0.1 2023-11-20 14:18:20,057 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 168000 2023-11-20 14:18:21,609 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-168000.pt 2023-11-20 14:18:32,617 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1119980.0, ans=0.125 2023-11-20 14:18:34,838 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 11700, loss[loss=0.07044, simple_loss=0.08268, pruned_loss=0.01503, audio_tagging_loss=0.01407, over 14334.00 frames. ], tot_loss[loss=0.07733, simple_loss=0.09778, pruned_loss=0.01842, audio_tagging_loss=0.01001, over 3043200.16 frames. ], batch size: 55, lr: 4.85e-03, grad_scale: 32.0 2023-11-20 14:18:58,245 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1120113.3333333333, ans=0.0 2023-11-20 14:18:59,677 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.45 vs. limit=10.0 2023-11-20 14:19:03,057 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1120180.0, ans=0.125 2023-11-20 14:19:09,470 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.797e+01 8.107e+01 8.645e+01 9.352e+01 1.111e+02, threshold=1.729e+02, percent-clipped=0.0 2023-11-20 14:19:18,413 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1120246.6666666667, ans=0.0 2023-11-20 14:19:19,914 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.00 vs. limit=22.5 2023-11-20 14:19:27,340 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 168050 2023-11-20 14:19:34,125 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 14:19:39,537 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 11750, loss[loss=0.08289, simple_loss=0.1066, pruned_loss=0.01899, audio_tagging_loss=0.01059, over 15319.00 frames. ], tot_loss[loss=0.07825, simple_loss=0.09913, pruned_loss=0.01874, audio_tagging_loss=0.009949, over 3050298.41 frames. ], batch size: 55, lr: 4.84e-03, grad_scale: 32.0 2023-11-20 14:20:09,669 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=11.56 vs. limit=15.0 2023-11-20 14:20:17,196 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1120580.0, ans=0.2 2023-11-20 14:20:25,165 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1120580.0, ans=0.1 2023-11-20 14:20:32,712 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 168100 2023-11-20 14:20:43,431 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 11800, loss[loss=0.06819, simple_loss=0.08708, pruned_loss=0.01499, audio_tagging_loss=0.009662, over 13808.00 frames. ], tot_loss[loss=0.07885, simple_loss=0.09972, pruned_loss=0.01907, audio_tagging_loss=0.009923, over 3044307.61 frames. ], batch size: 54, lr: 4.84e-03, grad_scale: 32.0 2023-11-20 14:21:03,480 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1120780.0, ans=0.125 2023-11-20 14:21:08,709 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.27 vs. limit=15.0 2023-11-20 14:21:13,813 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1120846.6666666667, ans=0.125 2023-11-20 14:21:15,195 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1120846.6666666667, ans=0.0 2023-11-20 14:21:19,047 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.990e+01 8.087e+01 8.933e+01 9.931e+01 1.196e+02, threshold=1.787e+02, percent-clipped=0.0 2023-11-20 14:21:31,240 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1120913.3333333333, ans=0.0 2023-11-20 14:21:31,600 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.94 vs. limit=12.0 2023-11-20 14:21:35,173 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1120980.0, ans=0.125 2023-11-20 14:21:36,594 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 168150 2023-11-20 14:21:47,563 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 11850, loss[loss=0.08579, simple_loss=0.1185, pruned_loss=0.01988, audio_tagging_loss=0.006668, over 15146.00 frames. ], tot_loss[loss=0.07884, simple_loss=0.09968, pruned_loss=0.019, audio_tagging_loss=0.01, over 3040332.84 frames. ], batch size: 56, lr: 4.84e-03, grad_scale: 32.0 2023-11-20 14:22:16,053 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1121180.0, ans=0.125 2023-11-20 14:22:33,748 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1121246.6666666667, ans=0.1 2023-11-20 14:22:40,241 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 168200 2023-11-20 14:22:51,469 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 11900, loss[loss=0.09069, simple_loss=0.1117, pruned_loss=0.02426, audio_tagging_loss=0.01058, over 16236.00 frames. ], tot_loss[loss=0.07884, simple_loss=0.09964, pruned_loss=0.01901, audio_tagging_loss=0.01, over 3050366.55 frames. ], batch size: 60, lr: 4.84e-03, grad_scale: 32.0 2023-11-20 14:23:10,425 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1121446.6666666667, ans=0.0 2023-11-20 14:23:17,188 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1121513.3333333333, ans=0.125 2023-11-20 14:23:27,237 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.121e+01 8.163e+01 8.778e+01 9.504e+01 1.300e+02, threshold=1.756e+02, percent-clipped=0.0 2023-11-20 14:23:28,798 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1121513.3333333333, ans=0.125 2023-11-20 14:23:33,608 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1121580.0, ans=0.125 2023-11-20 14:23:40,910 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1121580.0, ans=0.125 2023-11-20 14:23:45,749 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 168250 2023-11-20 14:23:56,549 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 11950, loss[loss=0.06919, simple_loss=0.08267, pruned_loss=0.01604, audio_tagging_loss=0.01181, over 15449.00 frames. ], tot_loss[loss=0.07854, simple_loss=0.09924, pruned_loss=0.0188, audio_tagging_loss=0.01012, over 3058758.52 frames. ], batch size: 57, lr: 4.84e-03, grad_scale: 32.0 2023-11-20 14:24:29,827 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1121846.6666666667, ans=0.0 2023-11-20 14:24:38,965 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.39 vs. limit=15.0 2023-11-20 14:24:39,782 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1121913.3333333333, ans=0.1 2023-11-20 14:24:48,381 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 168300 2023-11-20 14:24:58,994 INFO [train_asr.py:1262] (0/4) Epoch 14, batch 12000, loss[loss=0.07865, simple_loss=0.1035, pruned_loss=0.01553, audio_tagging_loss=0.01139, over 14504.00 frames. ], tot_loss[loss=0.07859, simple_loss=0.0994, pruned_loss=0.01873, audio_tagging_loss=0.01016, over 3051776.93 frames. ], batch size: 56, lr: 4.84e-03, grad_scale: 32.0 2023-11-20 14:24:58,997 INFO [train_asr.py:1285] (0/4) Computing validation loss 2023-11-20 14:25:36,345 INFO [zipformer.py:1873] (0/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.1494, 2.2678, 5.0137, 2.6755], device='cuda:0') 2023-11-20 14:25:41,041 INFO [train_asr.py:1294] (0/4) Epoch 14, validation: loss=0.06236, simple_loss=0.05348, pruned_loss=0.005638, audio_tagging_loss=0.02999, over 4681554.00 frames. 2023-11-20 14:25:41,042 INFO [train_asr.py:1295] (0/4) Maximum memory allocated so far is 25925MB 2023-11-20 14:25:45,975 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 14:26:09,041 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/epoch-14.pt 2023-11-20 14:26:46,225 INFO [train_asr.py:1262] (0/4) Epoch 15, batch 0, loss[loss=0.08484, simple_loss=0.09368, pruned_loss=0.01497, audio_tagging_loss=0.02302, over 16188.00 frames. ], tot_loss[loss=0.08484, simple_loss=0.09368, pruned_loss=0.01497, audio_tagging_loss=0.02302, over 16188.00 frames. ], batch size: 58, lr: 4.68e-03, grad_scale: 32.0 2023-11-20 14:26:46,228 INFO [train_asr.py:1285] (0/4) Computing validation loss 2023-11-20 14:27:01,203 INFO [zipformer.py:1873] (0/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.7276, 4.3048, 4.6494, 4.1125], device='cuda:0') 2023-11-20 14:27:21,778 INFO [train_asr.py:1294] (0/4) Epoch 15, validation: loss=0.06153, simple_loss=0.05347, pruned_loss=0.005654, audio_tagging_loss=0.02914, over 4681554.00 frames. 2023-11-20 14:27:21,779 INFO [train_asr.py:1295] (0/4) Maximum memory allocated so far is 25925MB 2023-11-20 14:27:26,690 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.106e+01 8.292e+01 9.006e+01 9.902e+01 1.226e+02, threshold=1.801e+02, percent-clipped=0.0 2023-11-20 14:27:40,475 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1122266.6666666667, ans=0.125 2023-11-20 14:27:44,733 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 168350 2023-11-20 14:27:54,039 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1122333.3333333333, ans=0.1 2023-11-20 14:28:08,284 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1122400.0, ans=0.125 2023-11-20 14:28:18,788 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1122466.6666666667, ans=0.125 2023-11-20 14:28:22,824 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1122466.6666666667, ans=0.1 2023-11-20 14:28:25,991 INFO [train_asr.py:1262] (0/4) Epoch 15, batch 50, loss[loss=0.109, simple_loss=0.1379, pruned_loss=0.02573, audio_tagging_loss=0.01428, over 16483.00 frames. ], tot_loss[loss=0.08783, simple_loss=0.09886, pruned_loss=0.01894, audio_tagging_loss=0.01946, over 691726.11 frames. ], batch size: 58, lr: 4.67e-03, grad_scale: 32.0 2023-11-20 14:28:26,953 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.76 vs. limit=15.0 2023-11-20 14:28:50,229 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 168400 2023-11-20 14:28:57,070 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1122666.6666666667, ans=0.2 2023-11-20 14:29:02,761 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1122666.6666666667, ans=0.0 2023-11-20 14:29:03,913 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1122666.6666666667, ans=0.125 2023-11-20 14:29:26,983 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1122800.0, ans=0.125 2023-11-20 14:29:32,378 INFO [train_asr.py:1262] (0/4) Epoch 15, batch 100, loss[loss=0.06928, simple_loss=0.08882, pruned_loss=0.01061, audio_tagging_loss=0.01426, over 14689.00 frames. ], tot_loss[loss=0.08741, simple_loss=0.09972, pruned_loss=0.0192, audio_tagging_loss=0.01835, over 1214467.72 frames. ], batch size: 55, lr: 4.67e-03, grad_scale: 16.0 2023-11-20 14:29:38,105 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.38 vs. limit=15.0 2023-11-20 14:29:39,161 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.599e+01 8.769e+01 9.395e+01 1.004e+02 1.341e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-20 14:29:55,513 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.27 vs. limit=22.5 2023-11-20 14:29:56,021 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 168450 2023-11-20 14:30:26,633 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.54 vs. limit=12.0 2023-11-20 14:30:37,466 INFO [train_asr.py:1262] (0/4) Epoch 15, batch 150, loss[loss=0.08478, simple_loss=0.1108, pruned_loss=0.01752, audio_tagging_loss=0.01186, over 15252.00 frames. ], tot_loss[loss=0.0851, simple_loss=0.09936, pruned_loss=0.01879, audio_tagging_loss=0.01663, over 1625374.93 frames. ], batch size: 56, lr: 4.67e-03, grad_scale: 16.0 2023-11-20 14:30:40,357 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1123200.0, ans=0.2 2023-11-20 14:30:43,833 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.37 vs. limit=6.0 2023-11-20 14:30:52,274 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.42 vs. limit=12.0 2023-11-20 14:30:58,313 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1123266.6666666667, ans=0.2 2023-11-20 14:31:01,041 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 168500 2023-11-20 14:31:02,486 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1123333.3333333333, ans=0.1 2023-11-20 14:31:06,290 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1123333.3333333333, ans=0.125 2023-11-20 14:31:10,935 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.89 vs. limit=22.5 2023-11-20 14:31:13,086 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1123333.3333333333, ans=0.125 2023-11-20 14:31:42,718 INFO [train_asr.py:1262] (0/4) Epoch 15, batch 200, loss[loss=0.0735, simple_loss=0.09888, pruned_loss=0.0166, audio_tagging_loss=0.007456, over 15596.00 frames. ], tot_loss[loss=0.08389, simple_loss=0.1007, pruned_loss=0.01897, audio_tagging_loss=0.01454, over 1945309.20 frames. ], batch size: 57, lr: 4.67e-03, grad_scale: 16.0 2023-11-20 14:31:48,826 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.970e+01 8.233e+01 8.956e+01 9.883e+01 1.318e+02, threshold=1.791e+02, percent-clipped=0.0 2023-11-20 14:31:53,868 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.68 vs. limit=15.0 2023-11-20 14:31:54,905 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.54 vs. limit=15.0 2023-11-20 14:31:59,079 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1123600.0, ans=0.1 2023-11-20 14:31:59,315 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.51 vs. limit=15.0 2023-11-20 14:32:06,145 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 168550 2023-11-20 14:32:48,647 INFO [train_asr.py:1262] (0/4) Epoch 15, batch 250, loss[loss=0.06794, simple_loss=0.07774, pruned_loss=0.01452, audio_tagging_loss=0.01455, over 13944.00 frames. ], tot_loss[loss=0.0827, simple_loss=0.1009, pruned_loss=0.01908, audio_tagging_loss=0.01317, over 2193164.28 frames. ], batch size: 55, lr: 4.67e-03, grad_scale: 16.0 2023-11-20 14:32:52,617 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1123866.6666666667, ans=0.2 2023-11-20 14:33:11,781 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 168600 2023-11-20 14:33:18,584 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.26 vs. limit=15.0 2023-11-20 14:33:21,865 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.16 vs. limit=15.0 2023-11-20 14:33:26,820 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1124066.6666666667, ans=0.07 2023-11-20 14:33:33,519 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1124066.6666666667, ans=0.125 2023-11-20 14:33:54,397 INFO [train_asr.py:1262] (0/4) Epoch 15, batch 300, loss[loss=0.07711, simple_loss=0.1023, pruned_loss=0.01476, audio_tagging_loss=0.01122, over 15094.00 frames. ], tot_loss[loss=0.08275, simple_loss=0.102, pruned_loss=0.01948, audio_tagging_loss=0.01225, over 2391428.32 frames. ], batch size: 56, lr: 4.67e-03, grad_scale: 16.0 2023-11-20 14:33:54,685 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1124200.0, ans=0.07 2023-11-20 14:34:00,427 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.977e+01 8.500e+01 9.120e+01 9.945e+01 1.401e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-20 14:34:10,660 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.34 vs. limit=15.0 2023-11-20 14:34:17,732 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 168650 2023-11-20 14:34:36,915 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1124400.0, ans=0.125 2023-11-20 14:34:48,651 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1124466.6666666667, ans=0.125 2023-11-20 14:34:59,626 INFO [train_asr.py:1262] (0/4) Epoch 15, batch 350, loss[loss=0.08866, simple_loss=0.1136, pruned_loss=0.02095, audio_tagging_loss=0.01092, over 16959.00 frames. ], tot_loss[loss=0.08202, simple_loss=0.1019, pruned_loss=0.0195, audio_tagging_loss=0.01157, over 2544880.09 frames. ], batch size: 61, lr: 4.67e-03, grad_scale: 16.0 2023-11-20 14:35:14,841 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.95 vs. limit=6.0 2023-11-20 14:35:24,839 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 168700 2023-11-20 14:35:29,378 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1124666.6666666667, ans=0.125 2023-11-20 14:35:33,275 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1124666.6666666667, ans=0.125 2023-11-20 14:35:35,797 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1124666.6666666667, ans=0.1 2023-11-20 14:35:56,523 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1124800.0, ans=0.2 2023-11-20 14:36:04,714 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1124800.0, ans=0.125 2023-11-20 14:36:06,982 INFO [train_asr.py:1262] (0/4) Epoch 15, batch 400, loss[loss=0.07213, simple_loss=0.09823, pruned_loss=0.01336, audio_tagging_loss=0.009661, over 16494.00 frames. ], tot_loss[loss=0.08129, simple_loss=0.1012, pruned_loss=0.01952, audio_tagging_loss=0.01115, over 2661705.94 frames. ], batch size: 61, lr: 4.67e-03, grad_scale: 32.0 2023-11-20 14:36:09,187 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1124866.6666666667, ans=0.125 2023-11-20 14:36:13,876 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.381e+01 8.162e+01 9.229e+01 1.065e+02 1.239e+02, threshold=1.846e+02, percent-clipped=0.0 2023-11-20 14:36:18,364 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.33 vs. limit=22.5 2023-11-20 14:36:30,692 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 168750 2023-11-20 14:36:31,238 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.06 vs. limit=15.0 2023-11-20 14:36:46,887 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.53 vs. limit=15.0 2023-11-20 14:36:49,551 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.54 vs. limit=22.5 2023-11-20 14:37:02,714 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.57 vs. limit=15.0 2023-11-20 14:37:07,403 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1125133.3333333333, ans=0.125 2023-11-20 14:37:12,780 INFO [train_asr.py:1262] (0/4) Epoch 15, batch 450, loss[loss=0.07887, simple_loss=0.09409, pruned_loss=0.01828, audio_tagging_loss=0.01355, over 15964.00 frames. ], tot_loss[loss=0.08053, simple_loss=0.1007, pruned_loss=0.01936, audio_tagging_loss=0.01085, over 2745526.05 frames. ], batch size: 61, lr: 4.67e-03, grad_scale: 16.0 2023-11-20 14:37:16,899 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1125200.0, ans=0.05 2023-11-20 14:37:23,988 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1125266.6666666667, ans=0.0 2023-11-20 14:37:35,546 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 168800 2023-11-20 14:38:17,235 INFO [train_asr.py:1262] (0/4) Epoch 15, batch 500, loss[loss=0.08957, simple_loss=0.1132, pruned_loss=0.02016, audio_tagging_loss=0.01281, over 15242.00 frames. ], tot_loss[loss=0.07963, simple_loss=0.09991, pruned_loss=0.0191, audio_tagging_loss=0.01057, over 2806822.14 frames. ], batch size: 57, lr: 4.67e-03, grad_scale: 16.0 2023-11-20 14:38:24,587 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.837e+01 8.018e+01 8.483e+01 9.528e+01 1.143e+02, threshold=1.697e+02, percent-clipped=0.0 2023-11-20 14:38:29,228 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1125600.0, ans=0.125 2023-11-20 14:38:38,956 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1125600.0, ans=0.0 2023-11-20 14:38:40,669 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.88 vs. limit=22.5 2023-11-20 14:38:41,287 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 168850 2023-11-20 14:38:43,932 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1125666.6666666667, ans=0.1 2023-11-20 14:39:00,572 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1125733.3333333333, ans=10.0 2023-11-20 14:39:07,817 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1125800.0, ans=0.0 2023-11-20 14:39:07,983 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1125800.0, ans=0.0 2023-11-20 14:39:16,056 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=1125800.0, ans=15.0 2023-11-20 14:39:21,997 INFO [train_asr.py:1262] (0/4) Epoch 15, batch 550, loss[loss=0.07488, simple_loss=0.09656, pruned_loss=0.01734, audio_tagging_loss=0.009261, over 14874.00 frames. ], tot_loss[loss=0.07934, simple_loss=0.09978, pruned_loss=0.019, audio_tagging_loss=0.01045, over 2866528.86 frames. ], batch size: 56, lr: 4.67e-03, grad_scale: 16.0 2023-11-20 14:39:45,559 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 168900 2023-11-20 14:39:53,216 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1126000.0, ans=0.1 2023-11-20 14:39:59,301 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1126066.6666666667, ans=0.2 2023-11-20 14:40:04,098 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1126066.6666666667, ans=0.0 2023-11-20 14:40:27,407 INFO [train_asr.py:1262] (0/4) Epoch 15, batch 600, loss[loss=0.07118, simple_loss=0.08638, pruned_loss=0.01648, audio_tagging_loss=0.01151, over 15177.00 frames. ], tot_loss[loss=0.07903, simple_loss=0.09917, pruned_loss=0.01905, audio_tagging_loss=0.0104, over 2898670.60 frames. ], batch size: 58, lr: 4.67e-03, grad_scale: 16.0 2023-11-20 14:40:33,015 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1126200.0, ans=0.2 2023-11-20 14:40:35,037 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.998e+01 8.140e+01 8.992e+01 9.843e+01 1.226e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-20 14:40:46,985 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.57 vs. limit=22.5 2023-11-20 14:40:50,145 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 168950 2023-11-20 14:40:51,718 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1126333.3333333333, ans=0.2 2023-11-20 14:40:55,542 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1126333.3333333333, ans=0.2 2023-11-20 14:41:32,776 INFO [train_asr.py:1262] (0/4) Epoch 15, batch 650, loss[loss=0.06016, simple_loss=0.07439, pruned_loss=0.01399, audio_tagging_loss=0.008976, over 14087.00 frames. ], tot_loss[loss=0.07814, simple_loss=0.09793, pruned_loss=0.01873, audio_tagging_loss=0.01044, over 2934648.03 frames. ], batch size: 56, lr: 4.67e-03, grad_scale: 16.0 2023-11-20 14:41:34,398 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1126533.3333333333, ans=0.0 2023-11-20 14:41:40,700 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1126533.3333333333, ans=0.07 2023-11-20 14:41:45,635 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1126600.0, ans=0.0 2023-11-20 14:41:45,733 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1126600.0, ans=0.0 2023-11-20 14:41:57,373 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 169000 2023-11-20 14:42:03,156 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1126666.6666666667, ans=0.0 2023-11-20 14:42:08,786 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1126666.6666666667, ans=0.05 2023-11-20 14:42:11,214 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1126666.6666666667, ans=0.2 2023-11-20 14:42:24,982 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1126800.0, ans=0.0 2023-11-20 14:42:35,221 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1126800.0, ans=0.125 2023-11-20 14:42:38,536 INFO [train_asr.py:1262] (0/4) Epoch 15, batch 700, loss[loss=0.09308, simple_loss=0.1123, pruned_loss=0.02488, audio_tagging_loss=0.01204, over 14749.00 frames. ], tot_loss[loss=0.07808, simple_loss=0.09817, pruned_loss=0.01874, audio_tagging_loss=0.01026, over 2962630.85 frames. ], batch size: 55, lr: 4.67e-03, grad_scale: 16.0 2023-11-20 14:42:47,989 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.688e+01 8.098e+01 8.725e+01 9.382e+01 1.189e+02, threshold=1.745e+02, percent-clipped=0.0 2023-11-20 14:43:03,504 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 169050 2023-11-20 14:43:07,447 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1127000.0, ans=0.2 2023-11-20 14:43:10,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=1127000.0, ans=0.95 2023-11-20 14:43:12,759 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.15 vs. limit=10.0 2023-11-20 14:43:14,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1127000.0, ans=0.5 2023-11-20 14:43:17,287 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1127066.6666666667, ans=0.125 2023-11-20 14:43:20,335 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.29 vs. limit=15.0 2023-11-20 14:43:28,900 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1127066.6666666667, ans=0.125 2023-11-20 14:43:45,383 INFO [train_asr.py:1262] (0/4) Epoch 15, batch 750, loss[loss=0.05932, simple_loss=0.06734, pruned_loss=0.01272, audio_tagging_loss=0.01294, over 14736.00 frames. ], tot_loss[loss=0.07856, simple_loss=0.09907, pruned_loss=0.01887, audio_tagging_loss=0.01015, over 2986206.27 frames. ], batch size: 56, lr: 4.67e-03, grad_scale: 16.0 2023-11-20 14:43:48,862 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1127200.0, ans=0.125 2023-11-20 14:44:07,484 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1127266.6666666667, ans=0.125 2023-11-20 14:44:08,672 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 169100 2023-11-20 14:44:11,354 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=1127333.3333333333, ans=10.0 2023-11-20 14:44:45,910 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 14:44:50,213 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.70 vs. limit=15.0 2023-11-20 14:44:50,616 INFO [train_asr.py:1262] (0/4) Epoch 15, batch 800, loss[loss=0.1031, simple_loss=0.1255, pruned_loss=0.02956, audio_tagging_loss=0.01079, over 14824.00 frames. ], tot_loss[loss=0.07982, simple_loss=0.1006, pruned_loss=0.01943, audio_tagging_loss=0.01009, over 2997537.22 frames. ], batch size: 56, lr: 4.66e-03, grad_scale: 32.0 2023-11-20 14:44:57,937 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.496e+01 8.097e+01 8.575e+01 9.313e+01 1.221e+02, threshold=1.715e+02, percent-clipped=0.0 2023-11-20 14:45:03,343 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1127600.0, ans=0.0 2023-11-20 14:45:13,887 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 169150 2023-11-20 14:45:23,497 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.80 vs. limit=15.0 2023-11-20 14:45:25,752 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1127666.6666666667, ans=0.0 2023-11-20 14:45:36,389 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1127733.3333333333, ans=0.125 2023-11-20 14:45:56,209 INFO [train_asr.py:1262] (0/4) Epoch 15, batch 850, loss[loss=0.0977, simple_loss=0.1207, pruned_loss=0.02748, audio_tagging_loss=0.009871, over 14869.00 frames. ], tot_loss[loss=0.07884, simple_loss=0.09892, pruned_loss=0.01912, audio_tagging_loss=0.01026, over 2997751.63 frames. ], batch size: 57, lr: 4.66e-03, grad_scale: 32.0 2023-11-20 14:46:04,181 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1127866.6666666667, ans=0.125 2023-11-20 14:46:17,605 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1127933.3333333333, ans=0.2 2023-11-20 14:46:21,171 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 169200 2023-11-20 14:46:24,288 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1128000.0, ans=0.07 2023-11-20 14:46:24,325 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1128000.0, ans=0.125 2023-11-20 14:46:30,738 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.81 vs. limit=12.0 2023-11-20 14:46:34,272 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1128000.0, ans=0.125 2023-11-20 14:46:37,875 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1128066.6666666667, ans=0.125 2023-11-20 14:46:52,294 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1128133.3333333333, ans=0.1 2023-11-20 14:47:02,638 INFO [train_asr.py:1262] (0/4) Epoch 15, batch 900, loss[loss=0.08442, simple_loss=0.1031, pruned_loss=0.02083, audio_tagging_loss=0.01203, over 14195.00 frames. ], tot_loss[loss=0.07849, simple_loss=0.09843, pruned_loss=0.01894, audio_tagging_loss=0.01034, over 3009052.82 frames. ], batch size: 55, lr: 4.66e-03, grad_scale: 32.0 2023-11-20 14:47:11,336 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.656e+01 8.130e+01 8.827e+01 9.752e+01 1.444e+02, threshold=1.765e+02, percent-clipped=0.0 2023-11-20 14:47:26,451 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 169250 2023-11-20 14:47:50,421 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.06 vs. limit=22.5 2023-11-20 14:48:07,412 INFO [train_asr.py:1262] (0/4) Epoch 15, batch 950, loss[loss=0.1001, simple_loss=0.1263, pruned_loss=0.02743, audio_tagging_loss=0.009562, over 15501.00 frames. ], tot_loss[loss=0.07944, simple_loss=0.1, pruned_loss=0.01925, audio_tagging_loss=0.01017, over 3021471.07 frames. ], batch size: 56, lr: 4.66e-03, grad_scale: 32.0 2023-11-20 14:48:12,632 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1128533.3333333333, ans=0.5 2023-11-20 14:48:30,314 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 169300 2023-11-20 14:48:49,768 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1128733.3333333333, ans=0.125 2023-11-20 14:48:50,909 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1128733.3333333333, ans=0.1 2023-11-20 14:48:57,216 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.44 vs. limit=22.5 2023-11-20 14:48:58,316 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1128800.0, ans=0.125 2023-11-20 14:49:10,916 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1128866.6666666667, ans=0.0 2023-11-20 14:49:11,803 INFO [train_asr.py:1262] (0/4) Epoch 15, batch 1000, loss[loss=0.06398, simple_loss=0.07891, pruned_loss=0.01266, audio_tagging_loss=0.01186, over 15585.00 frames. ], tot_loss[loss=0.0781, simple_loss=0.09822, pruned_loss=0.01878, audio_tagging_loss=0.01021, over 3033094.26 frames. ], batch size: 58, lr: 4.66e-03, grad_scale: 32.0 2023-11-20 14:49:19,998 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.688e+01 8.251e+01 8.894e+01 9.437e+01 1.345e+02, threshold=1.779e+02, percent-clipped=0.0 2023-11-20 14:49:30,475 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1128933.3333333333, ans=0.125 2023-11-20 14:49:35,958 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 169350 2023-11-20 14:49:40,310 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 14:49:47,981 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1129000.0, ans=0.1 2023-11-20 14:49:51,738 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1129066.6666666667, ans=0.0 2023-11-20 14:49:52,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1129066.6666666667, ans=0.2 2023-11-20 14:50:04,043 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1129133.3333333333, ans=0.1 2023-11-20 14:50:07,129 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1129133.3333333333, ans=0.0 2023-11-20 14:50:17,322 INFO [train_asr.py:1262] (0/4) Epoch 15, batch 1050, loss[loss=0.08073, simple_loss=0.1072, pruned_loss=0.01783, audio_tagging_loss=0.009302, over 14294.00 frames. ], tot_loss[loss=0.07805, simple_loss=0.09853, pruned_loss=0.01872, audio_tagging_loss=0.01007, over 3032103.61 frames. ], batch size: 54, lr: 4.66e-03, grad_scale: 16.0 2023-11-20 14:50:40,841 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 169400 2023-11-20 14:50:58,596 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1129400.0, ans=0.125 2023-11-20 14:50:58,730 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=1129400.0, ans=10.0 2023-11-20 14:51:02,477 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1129400.0, ans=0.0 2023-11-20 14:51:06,734 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1129400.0, ans=0.0 2023-11-20 14:51:17,212 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1129466.6666666667, ans=0.0 2023-11-20 14:51:23,872 INFO [train_asr.py:1262] (0/4) Epoch 15, batch 1100, loss[loss=0.06162, simple_loss=0.07726, pruned_loss=0.01328, audio_tagging_loss=0.00971, over 15398.00 frames. ], tot_loss[loss=0.07682, simple_loss=0.09698, pruned_loss=0.01832, audio_tagging_loss=0.01001, over 3033310.96 frames. ], batch size: 59, lr: 4.66e-03, grad_scale: 16.0 2023-11-20 14:51:26,403 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 14:51:29,072 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1129533.3333333333, ans=0.0 2023-11-20 14:51:32,471 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.387e+01 8.402e+01 8.962e+01 9.739e+01 1.697e+02, threshold=1.792e+02, percent-clipped=0.0 2023-11-20 14:51:32,775 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1129533.3333333333, ans=0.125 2023-11-20 14:51:41,968 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.87 vs. limit=15.0 2023-11-20 14:51:46,459 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.62 vs. limit=15.0 2023-11-20 14:51:47,061 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 169450 2023-11-20 14:51:52,159 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1129666.6666666667, ans=0.0 2023-11-20 14:51:53,337 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1129666.6666666667, ans=0.125 2023-11-20 14:51:56,743 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1129666.6666666667, ans=0.125 2023-11-20 14:52:04,963 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1129733.3333333333, ans=0.035 2023-11-20 14:52:13,547 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1129733.3333333333, ans=0.1 2023-11-20 14:52:29,253 INFO [train_asr.py:1262] (0/4) Epoch 15, batch 1150, loss[loss=0.1055, simple_loss=0.1466, pruned_loss=0.02701, audio_tagging_loss=0.005164, over 16132.00 frames. ], tot_loss[loss=0.07687, simple_loss=0.0971, pruned_loss=0.01833, audio_tagging_loss=0.009992, over 3033401.46 frames. ], batch size: 56, lr: 4.66e-03, grad_scale: 16.0 2023-11-20 14:52:40,214 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1129866.6666666667, ans=0.0 2023-11-20 14:52:40,259 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1129866.6666666667, ans=0.0 2023-11-20 14:52:40,454 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.82 vs. limit=10.0 2023-11-20 14:52:48,094 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1129933.3333333333, ans=0.0 2023-11-20 14:52:48,472 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.84 vs. limit=12.0 2023-11-20 14:52:53,523 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 169500 2023-11-20 14:53:35,197 INFO [train_asr.py:1262] (0/4) Epoch 15, batch 1200, loss[loss=0.08007, simple_loss=0.1122, pruned_loss=0.01862, audio_tagging_loss=0.005365, over 14763.00 frames. ], tot_loss[loss=0.07636, simple_loss=0.09639, pruned_loss=0.01815, audio_tagging_loss=0.01001, over 3023746.21 frames. ], batch size: 55, lr: 4.66e-03, grad_scale: 32.0 2023-11-20 14:53:44,433 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.469e+01 8.178e+01 8.897e+01 9.679e+01 1.493e+02, threshold=1.779e+02, percent-clipped=0.0 2023-11-20 14:53:45,985 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1130200.0, ans=0.0 2023-11-20 14:53:51,646 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.09 vs. limit=6.0 2023-11-20 14:53:58,869 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 169550 2023-11-20 14:54:00,451 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1130333.3333333333, ans=0.125 2023-11-20 14:54:05,209 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1130333.3333333333, ans=0.125 2023-11-20 14:54:10,067 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1130333.3333333333, ans=0.1 2023-11-20 14:54:39,281 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1130533.3333333333, ans=0.125 2023-11-20 14:54:40,152 INFO [train_asr.py:1262] (0/4) Epoch 15, batch 1250, loss[loss=0.06084, simple_loss=0.07545, pruned_loss=0.01555, audio_tagging_loss=0.007569, over 14122.00 frames. ], tot_loss[loss=0.07658, simple_loss=0.09656, pruned_loss=0.01844, audio_tagging_loss=0.009865, over 3022979.11 frames. ], batch size: 53, lr: 4.66e-03, grad_scale: 32.0 2023-11-20 14:54:49,729 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1130533.3333333333, ans=0.2 2023-11-20 14:54:59,650 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1130600.0, ans=0.0 2023-11-20 14:55:00,898 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1130600.0, ans=0.09899494936611666 2023-11-20 14:55:03,161 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 169600 2023-11-20 14:55:26,003 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1130733.3333333333, ans=0.0 2023-11-20 14:55:44,761 INFO [train_asr.py:1262] (0/4) Epoch 15, batch 1300, loss[loss=0.06034, simple_loss=0.06237, pruned_loss=0.01247, audio_tagging_loss=0.01668, over 14869.00 frames. ], tot_loss[loss=0.07603, simple_loss=0.09575, pruned_loss=0.01819, audio_tagging_loss=0.009966, over 3029009.01 frames. ], batch size: 59, lr: 4.66e-03, grad_scale: 32.0 2023-11-20 14:55:53,519 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.559e+01 8.274e+01 8.667e+01 1.016e+02 1.258e+02, threshold=1.733e+02, percent-clipped=0.0 2023-11-20 14:56:03,874 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.91 vs. limit=10.0 2023-11-20 14:56:07,288 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1130933.3333333333, ans=0.05 2023-11-20 14:56:08,403 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 169650 2023-11-20 14:56:08,594 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1130933.3333333333, ans=0.125 2023-11-20 14:56:14,235 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1131000.0, ans=0.1 2023-11-20 14:56:17,742 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=1131000.0, ans=0.05 2023-11-20 14:56:26,992 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.20 vs. limit=15.0 2023-11-20 14:56:30,903 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1131066.6666666667, ans=0.0 2023-11-20 14:56:49,816 INFO [train_asr.py:1262] (0/4) Epoch 15, batch 1350, loss[loss=0.09918, simple_loss=0.1353, pruned_loss=0.02506, audio_tagging_loss=0.006455, over 15859.00 frames. ], tot_loss[loss=0.07671, simple_loss=0.09647, pruned_loss=0.01848, audio_tagging_loss=0.009992, over 3029417.33 frames. ], batch size: 59, lr: 4.66e-03, grad_scale: 32.0 2023-11-20 14:57:13,645 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 169700 2023-11-20 14:57:13,897 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1131266.6666666667, ans=0.0 2023-11-20 14:57:17,418 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1131333.3333333333, ans=0.0 2023-11-20 14:57:27,209 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.64 vs. limit=15.0 2023-11-20 14:57:36,975 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 14:57:53,236 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1131466.6666666667, ans=0.125 2023-11-20 14:57:54,514 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 14:57:55,508 INFO [train_asr.py:1262] (0/4) Epoch 15, batch 1400, loss[loss=0.06758, simple_loss=0.08089, pruned_loss=0.0172, audio_tagging_loss=0.009937, over 14929.00 frames. ], tot_loss[loss=0.07684, simple_loss=0.09647, pruned_loss=0.01862, audio_tagging_loss=0.009984, over 3035139.10 frames. ], batch size: 59, lr: 4.66e-03, grad_scale: 32.0 2023-11-20 14:58:01,998 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1131533.3333333333, ans=0.2 2023-11-20 14:58:04,237 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.814e+01 7.998e+01 8.583e+01 9.280e+01 1.349e+02, threshold=1.717e+02, percent-clipped=0.0 2023-11-20 14:58:19,176 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 169750 2023-11-20 14:58:20,006 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.99 vs. limit=15.0 2023-11-20 14:58:36,582 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1131733.3333333333, ans=0.125 2023-11-20 14:59:00,594 INFO [train_asr.py:1262] (0/4) Epoch 15, batch 1450, loss[loss=0.06106, simple_loss=0.07969, pruned_loss=0.01124, audio_tagging_loss=0.009985, over 15130.00 frames. ], tot_loss[loss=0.07769, simple_loss=0.09768, pruned_loss=0.01886, audio_tagging_loss=0.009992, over 3039511.19 frames. ], batch size: 57, lr: 4.66e-03, grad_scale: 32.0 2023-11-20 14:59:06,475 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1131866.6666666667, ans=0.125 2023-11-20 14:59:23,557 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1131933.3333333333, ans=0.2 2023-11-20 14:59:24,564 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 169800 2023-11-20 14:59:49,963 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.whiten.whitening_limit, batch_count=1132066.6666666667, ans=15.0 2023-11-20 14:59:52,930 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.80 vs. limit=15.0 2023-11-20 15:00:06,398 INFO [train_asr.py:1262] (0/4) Epoch 15, batch 1500, loss[loss=0.07228, simple_loss=0.08346, pruned_loss=0.01911, audio_tagging_loss=0.01144, over 15132.00 frames. ], tot_loss[loss=0.07839, simple_loss=0.09858, pruned_loss=0.0192, audio_tagging_loss=0.009902, over 3040413.25 frames. ], batch size: 58, lr: 4.65e-03, grad_scale: 16.0 2023-11-20 15:00:14,750 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1132200.0, ans=0.125 2023-11-20 15:00:17,051 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.235e+01 8.270e+01 9.018e+01 9.743e+01 1.216e+02, threshold=1.804e+02, percent-clipped=0.0 2023-11-20 15:00:29,973 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 169850 2023-11-20 15:00:33,880 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1132333.3333333333, ans=0.1 2023-11-20 15:00:35,000 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1132333.3333333333, ans=0.0 2023-11-20 15:00:46,971 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1132400.0, ans=0.0 2023-11-20 15:01:11,542 INFO [train_asr.py:1262] (0/4) Epoch 15, batch 1550, loss[loss=0.06957, simple_loss=0.09121, pruned_loss=0.01506, audio_tagging_loss=0.008909, over 14420.00 frames. ], tot_loss[loss=0.07923, simple_loss=0.09965, pruned_loss=0.01936, audio_tagging_loss=0.01004, over 3040878.46 frames. ], batch size: 56, lr: 4.65e-03, grad_scale: 16.0 2023-11-20 15:01:34,437 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 169900 2023-11-20 15:01:36,146 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.24 vs. limit=15.0 2023-11-20 15:01:59,073 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1132733.3333333333, ans=0.125 2023-11-20 15:02:00,134 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1132733.3333333333, ans=0.125 2023-11-20 15:02:15,850 INFO [train_asr.py:1262] (0/4) Epoch 15, batch 1600, loss[loss=0.06037, simple_loss=0.07027, pruned_loss=0.01355, audio_tagging_loss=0.01168, over 15355.00 frames. ], tot_loss[loss=0.07918, simple_loss=0.09955, pruned_loss=0.01919, audio_tagging_loss=0.01022, over 3046475.32 frames. ], batch size: 62, lr: 4.65e-03, grad_scale: 32.0 2023-11-20 15:02:20,924 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1132866.6666666667, ans=0.125 2023-11-20 15:02:22,850 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=1132866.6666666667, ans=15.0 2023-11-20 15:02:24,332 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1132866.6666666667, ans=0.125 2023-11-20 15:02:26,263 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.675e+01 8.383e+01 8.914e+01 9.693e+01 2.648e+02, threshold=1.783e+02, percent-clipped=1.0 2023-11-20 15:02:37,070 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 15:02:39,773 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 169950 2023-11-20 15:03:07,821 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1133133.3333333333, ans=0.2 2023-11-20 15:03:09,037 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1133133.3333333333, ans=0.0 2023-11-20 15:03:19,275 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1133200.0, ans=0.125 2023-11-20 15:03:20,761 INFO [train_asr.py:1262] (0/4) Epoch 15, batch 1650, loss[loss=0.08823, simple_loss=0.1129, pruned_loss=0.02183, audio_tagging_loss=0.009961, over 15871.00 frames. ], tot_loss[loss=0.07915, simple_loss=0.09948, pruned_loss=0.01912, audio_tagging_loss=0.01029, over 3051129.59 frames. ], batch size: 58, lr: 4.65e-03, grad_scale: 32.0 2023-11-20 15:03:44,577 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 170000 2023-11-20 15:03:47,443 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1133333.3333333333, ans=0.125 2023-11-20 15:04:01,409 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.96 vs. limit=15.0 2023-11-20 15:04:02,630 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.26 vs. limit=6.0 2023-11-20 15:04:22,845 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1133466.6666666667, ans=0.125 2023-11-20 15:04:23,018 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1133466.6666666667, ans=0.2 2023-11-20 15:04:26,886 INFO [train_asr.py:1262] (0/4) Epoch 15, batch 1700, loss[loss=0.08258, simple_loss=0.1036, pruned_loss=0.02022, audio_tagging_loss=0.01058, over 16625.00 frames. ], tot_loss[loss=0.07878, simple_loss=0.0992, pruned_loss=0.01887, audio_tagging_loss=0.01031, over 3055316.50 frames. ], batch size: 59, lr: 4.65e-03, grad_scale: 32.0 2023-11-20 15:04:30,735 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1133533.3333333333, ans=0.0 2023-11-20 15:04:36,554 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.057e+01 7.994e+01 8.673e+01 9.340e+01 1.265e+02, threshold=1.735e+02, percent-clipped=0.0 2023-11-20 15:04:40,614 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1133600.0, ans=0.125 2023-11-20 15:04:48,877 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 170050 2023-11-20 15:05:21,531 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1133800.0, ans=0.1 2023-11-20 15:05:23,949 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1133800.0, ans=0.025 2023-11-20 15:05:30,946 INFO [train_asr.py:1262] (0/4) Epoch 15, batch 1750, loss[loss=0.06285, simple_loss=0.07033, pruned_loss=0.01375, audio_tagging_loss=0.01394, over 15752.00 frames. ], tot_loss[loss=0.07812, simple_loss=0.09854, pruned_loss=0.01865, audio_tagging_loss=0.0102, over 3059599.64 frames. ], batch size: 60, lr: 4.65e-03, grad_scale: 32.0 2023-11-20 15:05:39,860 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1133866.6666666667, ans=0.1 2023-11-20 15:05:54,403 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 170100 2023-11-20 15:06:02,694 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1134000.0, ans=10.0 2023-11-20 15:06:10,663 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1134066.6666666667, ans=0.125 2023-11-20 15:06:11,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1134066.6666666667, ans=0.125 2023-11-20 15:06:30,218 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1134133.3333333333, ans=0.125 2023-11-20 15:06:34,899 INFO [train_asr.py:1262] (0/4) Epoch 15, batch 1800, loss[loss=0.09939, simple_loss=0.1351, pruned_loss=0.02344, audio_tagging_loss=0.008405, over 15277.00 frames. ], tot_loss[loss=0.07819, simple_loss=0.09865, pruned_loss=0.01873, audio_tagging_loss=0.01014, over 3053941.99 frames. ], batch size: 55, lr: 4.65e-03, grad_scale: 32.0 2023-11-20 15:06:46,191 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 5.918e+01 8.061e+01 8.642e+01 9.411e+01 1.208e+02, threshold=1.728e+02, percent-clipped=0.0 2023-11-20 15:06:53,278 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.30 vs. limit=15.0 2023-11-20 15:06:59,347 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 170150 2023-11-20 15:07:06,747 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1134333.3333333333, ans=0.125 2023-11-20 15:07:14,755 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.17 vs. limit=12.0 2023-11-20 15:07:25,850 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.99 vs. limit=22.5 2023-11-20 15:07:35,996 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1134466.6666666667, ans=0.125 2023-11-20 15:07:38,834 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.11 vs. limit=15.0 2023-11-20 15:07:40,582 INFO [train_asr.py:1262] (0/4) Epoch 15, batch 1850, loss[loss=0.08785, simple_loss=0.1077, pruned_loss=0.0204, audio_tagging_loss=0.0136, over 15585.00 frames. ], tot_loss[loss=0.07817, simple_loss=0.0985, pruned_loss=0.01874, audio_tagging_loss=0.01017, over 3053926.43 frames. ], batch size: 58, lr: 4.65e-03, grad_scale: 16.0 2023-11-20 15:07:45,197 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1134533.3333333333, ans=0.125 2023-11-20 15:07:47,829 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1134533.3333333333, ans=0.125 2023-11-20 15:08:03,586 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 170200 2023-11-20 15:08:03,852 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1134600.0, ans=0.2 2023-11-20 15:08:16,947 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_na.min_abs, batch_count=1134666.6666666667, ans=0.02 2023-11-20 15:08:21,262 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1134733.3333333333, ans=0.125 2023-11-20 15:08:34,832 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.04 vs. limit=15.0 2023-11-20 15:08:42,049 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1134800.0, ans=0.125 2023-11-20 15:08:45,387 INFO [train_asr.py:1262] (0/4) Epoch 15, batch 1900, loss[loss=0.05784, simple_loss=0.07262, pruned_loss=0.00812, audio_tagging_loss=0.01342, over 14789.00 frames. ], tot_loss[loss=0.07762, simple_loss=0.0982, pruned_loss=0.01848, audio_tagging_loss=0.01004, over 3051944.74 frames. ], batch size: 56, lr: 4.65e-03, grad_scale: 16.0 2023-11-20 15:08:46,150 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=1134866.6666666667, ans=15.0 2023-11-20 15:08:56,320 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.854e+01 8.016e+01 8.806e+01 9.698e+01 1.880e+02, threshold=1.761e+02, percent-clipped=1.0 2023-11-20 15:09:04,187 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1134933.3333333333, ans=0.2 2023-11-20 15:09:08,253 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 170250 2023-11-20 15:09:08,432 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1134933.3333333333, ans=0.07 2023-11-20 15:09:17,531 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.33 vs. limit=15.0 2023-11-20 15:09:19,858 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.86 vs. limit=15.0 2023-11-20 15:09:28,730 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 15:09:45,853 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1135133.3333333333, ans=0.125 2023-11-20 15:09:48,217 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1135200.0, ans=0.125 2023-11-20 15:09:49,301 INFO [train_asr.py:1262] (0/4) Epoch 15, batch 1950, loss[loss=0.06985, simple_loss=0.0931, pruned_loss=0.01236, audio_tagging_loss=0.01094, over 15954.00 frames. ], tot_loss[loss=0.07772, simple_loss=0.09855, pruned_loss=0.01847, audio_tagging_loss=0.00998, over 3051474.24 frames. ], batch size: 59, lr: 4.65e-03, grad_scale: 16.0 2023-11-20 15:09:53,154 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1135200.0, ans=0.025 2023-11-20 15:10:09,278 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 15:10:13,356 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 170300 2023-11-20 15:10:53,729 INFO [train_asr.py:1262] (0/4) Epoch 15, batch 2000, loss[loss=0.04712, simple_loss=0.04282, pruned_loss=0.007039, audio_tagging_loss=0.01867, over 14854.00 frames. ], tot_loss[loss=0.07784, simple_loss=0.09851, pruned_loss=0.01856, audio_tagging_loss=0.01003, over 3049455.54 frames. ], batch size: 59, lr: 4.65e-03, grad_scale: 32.0 2023-11-20 15:10:54,145 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1135533.3333333333, ans=0.07 2023-11-20 15:10:57,480 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1135533.3333333333, ans=0.125 2023-11-20 15:11:05,443 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.656e+01 7.869e+01 8.530e+01 9.315e+01 1.202e+02, threshold=1.706e+02, percent-clipped=0.0 2023-11-20 15:11:16,027 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.60 vs. limit=12.0 2023-11-20 15:11:16,617 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 170350 2023-11-20 15:11:19,139 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1135666.6666666667, ans=0.0 2023-11-20 15:11:45,635 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1135800.0, ans=0.125 2023-11-20 15:11:46,905 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1135800.0, ans=0.0 2023-11-20 15:11:55,206 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1135800.0, ans=0.0 2023-11-20 15:11:58,361 INFO [train_asr.py:1262] (0/4) Epoch 15, batch 2050, loss[loss=0.06662, simple_loss=0.08169, pruned_loss=0.01395, audio_tagging_loss=0.01182, over 17114.00 frames. ], tot_loss[loss=0.07728, simple_loss=0.09779, pruned_loss=0.01838, audio_tagging_loss=0.01001, over 3053245.97 frames. ], batch size: 62, lr: 4.65e-03, grad_scale: 32.0 2023-11-20 15:12:14,811 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1135933.3333333333, ans=0.07 2023-11-20 15:12:21,253 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 170400 2023-11-20 15:12:44,274 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1136066.6666666667, ans=0.0 2023-11-20 15:13:02,584 INFO [train_asr.py:1262] (0/4) Epoch 15, batch 2100, loss[loss=0.07729, simple_loss=0.1006, pruned_loss=0.01698, audio_tagging_loss=0.01, over 15505.00 frames. ], tot_loss[loss=0.07769, simple_loss=0.09862, pruned_loss=0.01851, audio_tagging_loss=0.009867, over 3053225.39 frames. ], batch size: 58, lr: 4.65e-03, grad_scale: 32.0 2023-11-20 15:13:14,256 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.146e+01 8.226e+01 8.979e+01 1.003e+02 1.386e+02, threshold=1.796e+02, percent-clipped=0.0 2023-11-20 15:13:15,997 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1136266.6666666667, ans=0.0 2023-11-20 15:13:26,697 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 170450 2023-11-20 15:13:27,008 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1136266.6666666667, ans=0.2 2023-11-20 15:13:28,563 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1136333.3333333333, ans=0.1 2023-11-20 15:13:30,233 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.43 vs. limit=15.0 2023-11-20 15:13:49,855 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.13 vs. limit=15.0 2023-11-20 15:14:07,086 INFO [train_asr.py:1262] (0/4) Epoch 15, batch 2150, loss[loss=0.08336, simple_loss=0.1048, pruned_loss=0.01987, audio_tagging_loss=0.01111, over 15640.00 frames. ], tot_loss[loss=0.07815, simple_loss=0.09907, pruned_loss=0.01871, audio_tagging_loss=0.009896, over 3044134.60 frames. ], batch size: 58, lr: 4.65e-03, grad_scale: 32.0 2023-11-20 15:14:08,758 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1136533.3333333333, ans=0.0 2023-11-20 15:14:16,603 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1136533.3333333333, ans=0.0 2023-11-20 15:14:30,500 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 170500 2023-11-20 15:14:34,233 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1136666.6666666667, ans=0.125 2023-11-20 15:14:44,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1136733.3333333333, ans=0.0 2023-11-20 15:14:45,101 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 15:14:50,647 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=1136733.3333333333, ans=0.5 2023-11-20 15:14:50,715 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1136733.3333333333, ans=0.0 2023-11-20 15:14:58,757 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1136800.0, ans=0.125 2023-11-20 15:15:12,306 INFO [train_asr.py:1262] (0/4) Epoch 15, batch 2200, loss[loss=0.085, simple_loss=0.1114, pruned_loss=0.02137, audio_tagging_loss=0.007911, over 14655.00 frames. ], tot_loss[loss=0.07877, simple_loss=0.1001, pruned_loss=0.01895, audio_tagging_loss=0.00979, over 3046203.22 frames. ], batch size: 58, lr: 4.65e-03, grad_scale: 32.0 2023-11-20 15:15:16,942 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.32 vs. limit=15.0 2023-11-20 15:15:18,978 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1136866.6666666667, ans=0.0 2023-11-20 15:15:21,590 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1136866.6666666667, ans=0.0 2023-11-20 15:15:23,631 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.205e+01 8.421e+01 8.880e+01 9.731e+01 1.234e+02, threshold=1.776e+02, percent-clipped=0.0 2023-11-20 15:15:34,866 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 170550 2023-11-20 15:15:38,322 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1137000.0, ans=0.2 2023-11-20 15:15:41,915 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1137000.0, ans=0.125 2023-11-20 15:15:45,555 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1137000.0, ans=0.0 2023-11-20 15:16:16,546 INFO [train_asr.py:1262] (0/4) Epoch 15, batch 2250, loss[loss=0.09604, simple_loss=0.1248, pruned_loss=0.02557, audio_tagging_loss=0.008085, over 15513.00 frames. ], tot_loss[loss=0.08006, simple_loss=0.1017, pruned_loss=0.01941, audio_tagging_loss=0.009787, over 3048332.62 frames. ], batch size: 56, lr: 4.64e-03, grad_scale: 32.0 2023-11-20 15:16:36,159 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.13 vs. limit=15.0 2023-11-20 15:16:37,651 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1137266.6666666667, ans=0.1 2023-11-20 15:16:39,906 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 170600 2023-11-20 15:16:41,510 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1137333.3333333333, ans=0.125 2023-11-20 15:16:55,767 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.66 vs. limit=22.5 2023-11-20 15:16:58,232 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.39 vs. limit=10.0 2023-11-20 15:17:01,846 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.42 vs. limit=6.0 2023-11-20 15:17:16,918 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1137466.6666666667, ans=0.05 2023-11-20 15:17:21,562 INFO [train_asr.py:1262] (0/4) Epoch 15, batch 2300, loss[loss=0.06175, simple_loss=0.07594, pruned_loss=0.01166, audio_tagging_loss=0.01212, over 16150.00 frames. ], tot_loss[loss=0.07995, simple_loss=0.1011, pruned_loss=0.01943, audio_tagging_loss=0.009951, over 3056137.45 frames. ], batch size: 61, lr: 4.64e-03, grad_scale: 32.0 2023-11-20 15:17:27,364 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1137533.3333333333, ans=0.1 2023-11-20 15:17:33,173 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.682e+01 8.112e+01 8.584e+01 9.317e+01 1.375e+02, threshold=1.717e+02, percent-clipped=0.0 2023-11-20 15:17:45,530 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 170650 2023-11-20 15:17:48,219 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1137666.6666666667, ans=0.1 2023-11-20 15:17:54,318 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1137666.6666666667, ans=0.05 2023-11-20 15:17:59,099 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1137733.3333333333, ans=0.0 2023-11-20 15:18:11,829 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1137800.0, ans=0.125 2023-11-20 15:18:18,356 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 15:18:26,355 INFO [train_asr.py:1262] (0/4) Epoch 15, batch 2350, loss[loss=0.0739, simple_loss=0.09203, pruned_loss=0.0188, audio_tagging_loss=0.009088, over 14523.00 frames. ], tot_loss[loss=0.07915, simple_loss=0.09997, pruned_loss=0.01912, audio_tagging_loss=0.01004, over 3053186.37 frames. ], batch size: 55, lr: 4.64e-03, grad_scale: 32.0 2023-11-20 15:18:29,005 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1137866.6666666667, ans=0.125 2023-11-20 15:18:33,408 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1137866.6666666667, ans=0.05 2023-11-20 15:18:35,724 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1137866.6666666667, ans=0.125 2023-11-20 15:18:41,927 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1137933.3333333333, ans=0.125 2023-11-20 15:18:48,996 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 170700 2023-11-20 15:18:59,753 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1138000.0, ans=0.2 2023-11-20 15:19:30,691 INFO [train_asr.py:1262] (0/4) Epoch 15, batch 2400, loss[loss=0.06316, simple_loss=0.07366, pruned_loss=0.01352, audio_tagging_loss=0.0128, over 14377.00 frames. ], tot_loss[loss=0.07947, simple_loss=0.1002, pruned_loss=0.01921, audio_tagging_loss=0.01015, over 3053059.55 frames. ], batch size: 53, lr: 4.64e-03, grad_scale: 32.0 2023-11-20 15:19:40,662 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1138200.0, ans=0.125 2023-11-20 15:19:42,905 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.480e+01 8.080e+01 8.821e+01 9.568e+01 1.388e+02, threshold=1.764e+02, percent-clipped=0.0 2023-11-20 15:19:45,030 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1138266.6666666667, ans=0.125 2023-11-20 15:19:47,880 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.51 vs. limit=22.5 2023-11-20 15:19:54,180 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 170750 2023-11-20 15:20:04,968 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1138333.3333333333, ans=0.95 2023-11-20 15:20:07,110 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1138333.3333333333, ans=0.0 2023-11-20 15:20:28,866 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1138466.6666666667, ans=0.125 2023-11-20 15:20:30,281 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1138466.6666666667, ans=0.125 2023-11-20 15:20:35,641 INFO [train_asr.py:1262] (0/4) Epoch 15, batch 2450, loss[loss=0.07938, simple_loss=0.09871, pruned_loss=0.01891, audio_tagging_loss=0.01111, over 15085.00 frames. ], tot_loss[loss=0.07944, simple_loss=0.1003, pruned_loss=0.01918, audio_tagging_loss=0.01009, over 3054167.56 frames. ], batch size: 57, lr: 4.64e-03, grad_scale: 16.0 2023-11-20 15:20:43,713 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1138533.3333333333, ans=0.125 2023-11-20 15:20:43,755 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1138533.3333333333, ans=0.125 2023-11-20 15:20:51,793 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1138600.0, ans=0.1 2023-11-20 15:20:55,574 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1138600.0, ans=0.0 2023-11-20 15:20:59,020 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 170800 2023-11-20 15:21:05,349 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1138666.6666666667, ans=0.125 2023-11-20 15:21:07,756 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=1138666.6666666667, ans=0.5 2023-11-20 15:21:41,382 INFO [train_asr.py:1262] (0/4) Epoch 15, batch 2500, loss[loss=0.07159, simple_loss=0.09553, pruned_loss=0.01635, audio_tagging_loss=0.007478, over 14661.00 frames. ], tot_loss[loss=0.0789, simple_loss=0.0995, pruned_loss=0.01899, audio_tagging_loss=0.01016, over 3053582.36 frames. ], batch size: 55, lr: 4.64e-03, grad_scale: 16.0 2023-11-20 15:21:54,767 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.766e+01 8.090e+01 8.633e+01 9.562e+01 1.495e+02, threshold=1.727e+02, percent-clipped=0.0 2023-11-20 15:22:04,113 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 170850 2023-11-20 15:22:19,665 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1139066.6666666667, ans=0.0 2023-11-20 15:22:24,417 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.00 vs. limit=15.0 2023-11-20 15:22:25,149 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1139066.6666666667, ans=0.1 2023-11-20 15:22:31,898 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1139133.3333333333, ans=0.2 2023-11-20 15:22:34,643 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.47 vs. limit=15.0 2023-11-20 15:22:45,288 INFO [train_asr.py:1262] (0/4) Epoch 15, batch 2550, loss[loss=0.0618, simple_loss=0.07671, pruned_loss=0.01194, audio_tagging_loss=0.0115, over 14857.00 frames. ], tot_loss[loss=0.0776, simple_loss=0.09783, pruned_loss=0.0186, audio_tagging_loss=0.01008, over 3049744.61 frames. ], batch size: 58, lr: 4.64e-03, grad_scale: 16.0 2023-11-20 15:22:46,264 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1139200.0, ans=0.125 2023-11-20 15:22:46,268 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1139200.0, ans=0.125 2023-11-20 15:22:57,805 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.55 vs. limit=15.0 2023-11-20 15:23:03,932 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1139266.6666666667, ans=0.125 2023-11-20 15:23:06,405 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1139266.6666666667, ans=0.125 2023-11-20 15:23:08,688 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 170900 2023-11-20 15:23:08,906 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1139266.6666666667, ans=0.125 2023-11-20 15:23:14,473 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1139333.3333333333, ans=0.125 2023-11-20 15:23:20,303 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=1139333.3333333333, ans=15.0 2023-11-20 15:23:21,022 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1139333.3333333333, ans=0.125 2023-11-20 15:23:30,818 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1139400.0, ans=0.0 2023-11-20 15:23:37,560 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1139466.6666666667, ans=0.125 2023-11-20 15:23:50,111 INFO [train_asr.py:1262] (0/4) Epoch 15, batch 2600, loss[loss=0.06277, simple_loss=0.07331, pruned_loss=0.01529, audio_tagging_loss=0.01082, over 14531.00 frames. ], tot_loss[loss=0.07773, simple_loss=0.09818, pruned_loss=0.01871, audio_tagging_loss=0.009929, over 3042502.80 frames. ], batch size: 59, lr: 4.64e-03, grad_scale: 16.0 2023-11-20 15:24:03,278 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1139600.0, ans=0.0 2023-11-20 15:24:04,208 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.595e+01 8.100e+01 8.871e+01 9.785e+01 4.234e+02, threshold=1.774e+02, percent-clipped=0.0 2023-11-20 15:24:13,754 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 170950 2023-11-20 15:24:55,134 INFO [train_asr.py:1262] (0/4) Epoch 15, batch 2650, loss[loss=0.05909, simple_loss=0.07542, pruned_loss=0.0129, audio_tagging_loss=0.008486, over 15445.00 frames. ], tot_loss[loss=0.07745, simple_loss=0.09774, pruned_loss=0.01873, audio_tagging_loss=0.009852, over 3047386.29 frames. ], batch size: 57, lr: 4.64e-03, grad_scale: 16.0 2023-11-20 15:25:10,511 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1139933.3333333333, ans=0.125 2023-11-20 15:25:13,391 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.26 vs. limit=22.5 2023-11-20 15:25:18,202 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 171000 2023-11-20 15:25:50,581 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1140133.3333333333, ans=0.125 2023-11-20 15:26:00,166 INFO [train_asr.py:1262] (0/4) Epoch 15, batch 2700, loss[loss=0.09299, simple_loss=0.1235, pruned_loss=0.02359, audio_tagging_loss=0.007668, over 15967.00 frames. ], tot_loss[loss=0.07705, simple_loss=0.09714, pruned_loss=0.01861, audio_tagging_loss=0.009866, over 3048216.53 frames. ], batch size: 56, lr: 4.64e-03, grad_scale: 16.0 2023-11-20 15:26:14,313 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.575e+01 8.137e+01 8.718e+01 9.635e+01 1.314e+02, threshold=1.744e+02, percent-clipped=1.0 2023-11-20 15:26:20,519 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.78 vs. limit=22.5 2023-11-20 15:26:23,566 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 171050 2023-11-20 15:26:24,039 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.17 vs. limit=15.0 2023-11-20 15:26:27,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1140333.3333333333, ans=0.1 2023-11-20 15:27:04,589 INFO [train_asr.py:1262] (0/4) Epoch 15, batch 2750, loss[loss=0.09456, simple_loss=0.1094, pruned_loss=0.03151, audio_tagging_loss=0.008359, over 15016.00 frames. ], tot_loss[loss=0.07709, simple_loss=0.09717, pruned_loss=0.01871, audio_tagging_loss=0.009792, over 3056314.79 frames. ], batch size: 56, lr: 4.64e-03, grad_scale: 16.0 2023-11-20 15:27:28,517 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 171100 2023-11-20 15:27:36,200 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1140666.6666666667, ans=0.0 2023-11-20 15:28:00,052 WARNING [train_asr.py:1506] (0/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 15:28:00,207 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1140800.0, ans=0.125 2023-11-20 15:28:03,121 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.13 vs. limit=22.5 2023-11-20 15:28:09,492 INFO [train_asr.py:1262] (0/4) Epoch 15, batch 2800, loss[loss=0.08029, simple_loss=0.09844, pruned_loss=0.02207, audio_tagging_loss=0.009001, over 15505.00 frames. ], tot_loss[loss=0.07785, simple_loss=0.09834, pruned_loss=0.01891, audio_tagging_loss=0.009767, over 3048080.66 frames. ], batch size: 57, lr: 4.64e-03, grad_scale: 32.0 2023-11-20 15:28:21,301 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1140933.3333333333, ans=0.0 2023-11-20 15:28:23,478 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.091e+01 8.017e+01 8.655e+01 9.428e+01 1.274e+02, threshold=1.731e+02, percent-clipped=0.0 2023-11-20 15:28:32,329 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 171150 2023-11-20 15:29:13,873 INFO [train_asr.py:1262] (0/4) Epoch 15, batch 2850, loss[loss=0.06295, simple_loss=0.07067, pruned_loss=0.01627, audio_tagging_loss=0.01134, over 14331.00 frames. ], tot_loss[loss=0.07754, simple_loss=0.09801, pruned_loss=0.01879, audio_tagging_loss=0.009746, over 3044209.30 frames. ], batch size: 57, lr: 4.64e-03, grad_scale: 32.0 2023-11-20 15:29:21,370 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1141200.0, ans=0.1 2023-11-20 15:29:37,357 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 171200 2023-11-20 15:29:45,762 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.35 vs. limit=15.0 2023-11-20 15:30:18,051 INFO [train_asr.py:1262] (0/4) Epoch 15, batch 2900, loss[loss=0.07667, simple_loss=0.1052, pruned_loss=0.01515, audio_tagging_loss=0.008941, over 15381.00 frames. ], tot_loss[loss=0.07839, simple_loss=0.09933, pruned_loss=0.01897, audio_tagging_loss=0.009752, over 3045154.21 frames. ], batch size: 55, lr: 4.64e-03, grad_scale: 32.0 2023-11-20 15:30:32,494 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.283e+01 7.914e+01 8.602e+01 9.219e+01 1.779e+02, threshold=1.720e+02, percent-clipped=1.0 2023-11-20 15:30:42,577 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 171250 2023-11-20 15:31:11,089 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1141800.0, ans=0.0 2023-11-20 15:31:14,859 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1141800.0, ans=0.125 2023-11-20 15:31:19,553 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1141800.0, ans=0.0 2023-11-20 15:31:23,105 INFO [train_asr.py:1262] (0/4) Epoch 15, batch 2950, loss[loss=0.06346, simple_loss=0.07973, pruned_loss=0.01365, audio_tagging_loss=0.00994, over 15629.00 frames. ], tot_loss[loss=0.07892, simple_loss=0.1001, pruned_loss=0.0192, audio_tagging_loss=0.009661, over 3043040.72 frames. ], batch size: 60, lr: 4.64e-03, grad_scale: 32.0 2023-11-20 15:31:28,893 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1141866.6666666667, ans=0.2 2023-11-20 15:31:37,059 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1141933.3333333333, ans=0.1 2023-11-20 15:31:46,759 INFO [model.py:792] (0/4) Freeze_encoder: False; Current batch idx: 171300 2023-11-20 15:32:14,909 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.70 vs. limit=15.0 2023-11-20 15:32:28,325 INFO [train_asr.py:1262] (0/4) Epoch 15, batch 3000, loss[loss=0.06432, simple_loss=0.08821, pruned_loss=0.01152, audio_tagging_loss=0.008694, over 15297.00 frames. ], tot_loss[loss=0.07885, simple_loss=0.09979, pruned_loss=0.01911, audio_tagging_loss=0.009847, over 3043632.09 frames. ], batch size: 57, lr: 4.63e-03, grad_scale: 32.0 2023-11-20 15:32:28,329 INFO [train_asr.py:1285] (0/4) Computing validation loss