2023-11-25 20:20:52,085 INFO [train_asr.py:1303] (0/4) Training started 2023-11-25 20:20:52,091 INFO [train_asr.py:1313] (0/4) Device: cuda:0 2023-11-25 20:20:52,093 INFO [train_asr.py:1325] (0/4) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '2b2ac14b326d61d79d04e53fbd69b1ff6d630411', 'k2-git-date': 'Thu Aug 24 05:58:26 2023', 'lhotse-version': '1.16.0', 'torch-version': '2.0.1+cu117', 'torch-cuda-available': True, 'torch-cuda-version': '11.7', 'python-version': '3.1', 'icefall-git-branch': 'multi_KD', 'icefall-git-sha1': 'a9ea720f-dirty', 'icefall-git-date': 'Wed Nov 22 17:48:49 2023', 'icefall-path': '/star-xy/softwares/icefall_development/icefall_multi_KD', 'k2-path': '/star-xy/softwares/k2_development/k2/k2/python/k2/__init__.py', 'lhotse-path': '/star-xy/softwares/anaconda3/envs/multi_KD/lib/python3.10/site-packages/lhotse/__init__.py', 'hostname': 'de-74279-k2-train-10-1125112954-6d844cbdd8-m6xmg', 'IP address': '10.177.94.19'}, 'world_size': 4, 'master_port': 13490, 'tensorboard': True, 'num_epochs': 50, 'start_epoch': 39, 'start_batch': 0, 'exp_dir': PosixPath('multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0'), 'bpe_model': 'data/lang_bpe_500/bpe.model', 'base_lr': 0.045, 'lr_batches': 7500, 'lr_epochs': 3.5, 'ref_duration': 600, 'context_size': 2, 'prune_range': 5, 'lm_scale': 0.25, 'am_scale': 0.0, 'simple_loss_scale': 0.5, 'ctc_loss_scale': 0.2, 'audio_tagging_loss_scale': 1.0, 'seed': 42, 'print_diagnostics': False, 'inf_check': False, 'save_every_n': 4000, 'keep_last_k': 30, 'average_period': 200, 'use_fp16': True, 'stop_early': False, 'do_finetune': False, 'init_modules': None, 'freeze_modules': None, 'finetune_ckpt': None, 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'causal': False, 'chunk_size': '16,32,64,-1', 'left_context_frames': '64,128,256,-1', 'use_transducer': True, 'use_ctc': False, 'do_audio_tagging': True, 'use_encoder_projection': False, 'encoder_projection_dim': -1, 'freeze_encoder': False, 'freezing_encoder_layer_index': '-1', 'freeze_encoder_steps': -1, 'encoder_lr_scale': 1.0, 'beats_label': True, 'full_libri': True, 'mini_libri': False, 'use_vox2': False, 'use_libriheavy': False, 'libriheavy_subset': 'small', 'use_audioset': True, 'audioset_subset': 'unbalanced', 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 1000, 'bucketing_sampler': False, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'enable_audioset': False, 'use_musan_separately': False, 'input_strategy': 'PrecomputedFeatures', 'drop_features': False, 'return_audio': False, 'use_beats': True, 'use_ecapa': True, 'use_whisper': True, 'whisper_mvq': False, 'beats_ckpt': 'data/models/BEATs/BEATs_iter3_plus_AS2M_finetuned_on_AS2M_cpt2.pt', 'whisper_version': 'small.en', 'blank_id': 0, 'vocab_size': 500} 2023-11-25 20:20:52,094 INFO [train_asr.py:1334] (0/4) About to create model 2023-11-25 20:20:52,761 INFO [train_asr.py:1338] (0/4) Number of model parameters: 65819362 2023-11-25 20:20:53,244 INFO [train_asr.py:1362] (0/4) Using CED labels! 2023-11-25 20:20:53,244 INFO [checkpoint.py:112] (0/4) Loading checkpoint from multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/epoch-38.pt 2023-11-25 20:20:56,555 INFO [checkpoint.py:131] (0/4) Loading averaged model 2023-11-25 20:20:56,690 INFO [train_asr.py:1370] (0/4) Setting the lr scale of parameters in encoder and encoder_embed to 1.0 2023-11-25 20:20:59,952 INFO [train_asr.py:1379] (0/4) Using DDP 2023-11-25 20:21:00,337 INFO [train_asr.py:1402] (0/4) Loading optimizer state dict 2023-11-25 20:21:00,811 INFO [train_asr.py:1410] (0/4) Loading scheduler state dict 2023-11-25 20:21:00,876 INFO [train_asr.py:1432] (0/4) Getting audioset cuts 2023-11-25 20:21:00,877 INFO [kd_datamodule.py:784] (0/4) About to get the audioset cuts. 2023-11-25 20:21:00,964 INFO [train_asr.py:1438] (0/4) Using mux to combine Librispeech with audioset 2023-11-25 20:21:00,964 INFO [train_asr.py:1449] (0/4) CutSet(len=2748469) [underlying data type: ] 2023-11-25 20:21:09,897 INFO [kd_datamodule.py:396] (0/4) Enable MUSAN 2023-11-25 20:21:09,897 INFO [kd_datamodule.py:397] (0/4) About to get Musan cuts 2023-11-25 20:21:12,402 INFO [kd_datamodule.py:427] (0/4) Enable SpecAugment 2023-11-25 20:21:12,402 INFO [kd_datamodule.py:428] (0/4) Time warp factor: 80 2023-11-25 20:21:12,402 INFO [kd_datamodule.py:438] (0/4) Num frame mask: 10 2023-11-25 20:21:12,403 INFO [kd_datamodule.py:451] (0/4) About to create train dataset 2023-11-25 20:21:12,426 INFO [kd_datamodule.py:487] (0/4) Using SimpleCutSampler 2023-11-25 20:21:12,426 INFO [kd_datamodule.py:495] (0/4) About to create train dataloader 2023-11-25 20:21:12,465 INFO [kd_datamodule.py:802] (0/4) About to get the audioset eval cuts. 2023-11-25 20:21:12,484 INFO [train_asr.py:1513] (0/4) CutSet(len=20681) [underlying data type: ] 2023-11-25 20:21:12,539 INFO [kd_datamodule.py:529] (0/4) About to create dev dataset 2023-11-25 20:21:12,981 INFO [kd_datamodule.py:550] (0/4) About to create dev dataloader 2023-11-25 20:21:12,981 INFO [train_asr.py:1527] (0/4) Loading grad scaler state dict 2023-11-25 20:21:48,363 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 0, loss[loss=0.1346, simple_loss=0.08696, pruned_loss=0.009115, audio_tagging_loss=0.08203, over 15170.00 frames. ], tot_loss[loss=0.1346, simple_loss=0.08696, pruned_loss=0.009115, audio_tagging_loss=0.08203, over 15170.00 frames. ], batch size: 58, lr: 1.75e-03, grad_scale: 32.0 2023-11-25 20:21:48,366 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-25 20:22:00,649 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.1470, 4.9348, 3.9151, 4.4063], device='cuda:0') 2023-11-25 20:22:03,481 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.5333, 4.4996, 4.2730, 4.3984], device='cuda:0') 2023-11-25 20:22:08,263 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([3.9533, 3.1578, 2.8643, 3.1483, 3.3572, 2.8082, 3.4062, 2.6017], device='cuda:0') 2023-11-25 20:22:20,740 INFO [train_asr.py:1267] (0/4) Epoch 39, validation: loss=0.127, simple_loss=0.05083, pruned_loss=0.005243, audio_tagging_loss=0.09629, over 4681554.00 frames. 2023-11-25 20:22:20,741 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-25 20:22:25,382 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3046020.0, ans=0.125 2023-11-25 20:22:29,024 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3046020.0, ans=0.125 2023-11-25 20:22:33,215 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3046086.6666666665, ans=0.0 2023-11-25 20:22:36,346 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3046086.6666666665, ans=0.2 2023-11-25 20:22:41,494 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3046086.6666666665, ans=0.125 2023-11-25 20:23:09,207 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.46 vs. limit=22.5 2023-11-25 20:23:10,928 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 456950 2023-11-25 20:23:16,256 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 50, loss[loss=0.09638, simple_loss=0.1129, pruned_loss=0.01642, audio_tagging_loss=0.02352, over 15523.00 frames. ], tot_loss[loss=0.1011, simple_loss=0.09223, pruned_loss=0.01276, audio_tagging_loss=0.04218, over 686606.58 frames. ], batch size: 56, lr: 1.75e-03, grad_scale: 32.0 2023-11-25 20:23:16,509 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3046353.3333333335, ans=0.1 2023-11-25 20:23:36,573 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.21 vs. limit=15.0 2023-11-25 20:23:39,126 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.865e+01 9.525e+01 1.035e+02 1.246e+02 6.272e+02, threshold=2.069e+02, percent-clipped=17.0 2023-11-25 20:23:43,483 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3046486.6666666665, ans=0.125 2023-11-25 20:23:52,074 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3046553.3333333335, ans=0.0 2023-11-25 20:23:56,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3046553.3333333335, ans=0.0 2023-11-25 20:24:00,500 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3046620.0, ans=0.1 2023-11-25 20:24:06,630 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 457000 2023-11-25 20:24:12,185 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 100, loss[loss=0.08573, simple_loss=0.09689, pruned_loss=0.01247, audio_tagging_loss=0.02481, over 14580.00 frames. ], tot_loss[loss=0.09352, simple_loss=0.09207, pruned_loss=0.01233, audio_tagging_loss=0.03516, over 1205874.49 frames. ], batch size: 56, lr: 1.75e-03, grad_scale: 32.0 2023-11-25 20:24:13,507 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3046686.6666666665, ans=0.125 2023-11-25 20:24:34,595 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3046820.0, ans=0.1 2023-11-25 20:24:42,946 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3046886.6666666665, ans=0.0 2023-11-25 20:25:00,689 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 457050 2023-11-25 20:25:05,938 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 150, loss[loss=0.09134, simple_loss=0.1153, pruned_loss=0.0167, audio_tagging_loss=0.01699, over 14399.00 frames. ], tot_loss[loss=0.08596, simple_loss=0.0897, pruned_loss=0.01206, audio_tagging_loss=0.02905, over 1618580.66 frames. ], batch size: 52, lr: 1.75e-03, grad_scale: 32.0 2023-11-25 20:25:10,887 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.37 vs. limit=15.0 2023-11-25 20:25:28,189 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.181e+01 8.752e+01 9.435e+01 1.031e+02 1.991e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-25 20:25:55,155 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 457100 2023-11-25 20:26:00,390 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 200, loss[loss=0.07025, simple_loss=0.08991, pruned_loss=0.01515, audio_tagging_loss=0.01015, over 16247.00 frames. ], tot_loss[loss=0.08098, simple_loss=0.09003, pruned_loss=0.01235, audio_tagging_loss=0.02362, over 1932448.81 frames. ], batch size: 62, lr: 1.75e-03, grad_scale: 32.0 2023-11-25 20:26:08,860 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3047353.3333333335, ans=0.1 2023-11-25 20:26:33,847 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.62 vs. limit=15.0 2023-11-25 20:26:36,827 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3047553.3333333335, ans=0.125 2023-11-25 20:26:44,071 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3047620.0, ans=0.05 2023-11-25 20:26:50,191 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 457150 2023-11-25 20:26:51,306 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3047620.0, ans=0.125 2023-11-25 20:26:53,442 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3047620.0, ans=0.125 2023-11-25 20:26:55,836 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 250, loss[loss=0.06709, simple_loss=0.08402, pruned_loss=0.01204, audio_tagging_loss=0.01304, over 16100.00 frames. ], tot_loss[loss=0.07804, simple_loss=0.09084, pruned_loss=0.01266, audio_tagging_loss=0.01996, over 2177459.10 frames. ], batch size: 60, lr: 1.75e-03, grad_scale: 32.0 2023-11-25 20:26:57,115 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3047686.6666666665, ans=0.2 2023-11-25 20:27:09,662 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3047753.3333333335, ans=0.125 2023-11-25 20:27:16,740 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.631e+01 9.232e+01 9.778e+01 1.082e+02 1.251e+02, threshold=1.956e+02, percent-clipped=0.0 2023-11-25 20:27:18,001 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_na.min_abs, batch_count=3047820.0, ans=0.02 2023-11-25 20:27:21,166 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3047820.0, ans=0.2 2023-11-25 20:27:24,359 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3047820.0, ans=0.125 2023-11-25 20:27:44,428 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 457200 2023-11-25 20:27:50,100 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 300, loss[loss=0.07284, simple_loss=0.1027, pruned_loss=0.01428, audio_tagging_loss=0.007218, over 14989.00 frames. ], tot_loss[loss=0.0754, simple_loss=0.09065, pruned_loss=0.01264, audio_tagging_loss=0.01744, over 2372209.72 frames. ], batch size: 56, lr: 1.75e-03, grad_scale: 32.0 2023-11-25 20:27:55,522 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3048020.0, ans=0.125 2023-11-25 20:27:55,625 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3048020.0, ans=0.125 2023-11-25 20:28:16,332 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3048153.3333333335, ans=0.125 2023-11-25 20:28:16,379 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3048153.3333333335, ans=0.0 2023-11-25 20:28:17,416 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3048153.3333333335, ans=0.1 2023-11-25 20:28:18,430 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3048153.3333333335, ans=0.0 2023-11-25 20:28:29,185 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3048220.0, ans=0.1 2023-11-25 20:28:38,394 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 457250 2023-11-25 20:28:39,638 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3048286.6666666665, ans=0.1 2023-11-25 20:28:43,509 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 350, loss[loss=0.07451, simple_loss=0.09576, pruned_loss=0.01558, audio_tagging_loss=0.01105, over 16026.00 frames. ], tot_loss[loss=0.07336, simple_loss=0.09013, pruned_loss=0.01255, audio_tagging_loss=0.01575, over 2523210.04 frames. ], batch size: 58, lr: 1.75e-03, grad_scale: 32.0 2023-11-25 20:29:00,703 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.36 vs. limit=12.0 2023-11-25 20:29:02,013 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3048420.0, ans=0.07 2023-11-25 20:29:02,963 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3048420.0, ans=0.125 2023-11-25 20:29:05,854 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.598e+01 8.941e+01 9.403e+01 1.014e+02 1.528e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-25 20:29:23,711 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3048553.3333333335, ans=0.0 2023-11-25 20:29:31,016 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 20:29:31,908 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 457300 2023-11-25 20:29:38,119 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 400, loss[loss=0.08772, simple_loss=0.1189, pruned_loss=0.01949, audio_tagging_loss=0.008786, over 15643.00 frames. ], tot_loss[loss=0.07286, simple_loss=0.09112, pruned_loss=0.01278, audio_tagging_loss=0.01452, over 2640823.12 frames. ], batch size: 56, lr: 1.75e-03, grad_scale: 32.0 2023-11-25 20:29:38,623 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.75 vs. limit=22.5 2023-11-25 20:29:43,394 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3048686.6666666665, ans=0.015 2023-11-25 20:29:54,973 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.10 vs. limit=6.0 2023-11-25 20:30:00,983 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=3048820.0, ans=6.0 2023-11-25 20:30:26,443 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 457350 2023-11-25 20:30:32,090 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 450, loss[loss=0.06854, simple_loss=0.0946, pruned_loss=0.01127, audio_tagging_loss=0.009965, over 14850.00 frames. ], tot_loss[loss=0.07127, simple_loss=0.09055, pruned_loss=0.0124, audio_tagging_loss=0.01359, over 2727001.52 frames. ], batch size: 55, lr: 1.75e-03, grad_scale: 32.0 2023-11-25 20:30:52,807 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.321e+01 8.642e+01 9.445e+01 1.011e+02 1.527e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-25 20:31:05,666 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3049220.0, ans=0.0 2023-11-25 20:31:20,692 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 457400 2023-11-25 20:31:25,806 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.73 vs. limit=12.0 2023-11-25 20:31:26,191 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 500, loss[loss=0.06647, simple_loss=0.09373, pruned_loss=0.00927, audio_tagging_loss=0.01033, over 14748.00 frames. ], tot_loss[loss=0.07074, simple_loss=0.0908, pruned_loss=0.01256, audio_tagging_loss=0.01278, over 2802507.20 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 20:31:42,029 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3049420.0, ans=0.125 2023-11-25 20:31:42,950 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3049420.0, ans=0.04949747468305833 2023-11-25 20:32:14,248 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 457450 2023-11-25 20:32:14,329 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3049620.0, ans=0.2 2023-11-25 20:32:18,102 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3049620.0, ans=0.125 2023-11-25 20:32:20,578 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 550, loss[loss=0.06649, simple_loss=0.09211, pruned_loss=0.01024, audio_tagging_loss=0.0102, over 14507.00 frames. ], tot_loss[loss=0.07075, simple_loss=0.09124, pruned_loss=0.01283, audio_tagging_loss=0.01229, over 2857163.85 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:32:22,824 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3049686.6666666665, ans=0.0 2023-11-25 20:32:23,113 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.75 vs. limit=15.0 2023-11-25 20:32:30,156 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3049753.3333333335, ans=0.0 2023-11-25 20:32:42,782 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.403e+01 8.908e+01 9.696e+01 1.036e+02 1.301e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-25 20:32:45,417 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.89 vs. limit=15.0 2023-11-25 20:32:51,294 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3049886.6666666665, ans=0.2 2023-11-25 20:32:51,385 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3049886.6666666665, ans=0.1 2023-11-25 20:33:08,825 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 457500 2023-11-25 20:33:13,976 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 600, loss[loss=0.07607, simple_loss=0.09542, pruned_loss=0.01763, audio_tagging_loss=0.01073, over 14755.00 frames. ], tot_loss[loss=0.07049, simple_loss=0.09148, pruned_loss=0.01283, audio_tagging_loss=0.01192, over 2903554.15 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:33:16,725 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3050020.0, ans=0.0 2023-11-25 20:33:19,227 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.50 vs. limit=15.0 2023-11-25 20:33:37,552 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3050153.3333333335, ans=0.2 2023-11-25 20:34:02,008 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3050286.6666666665, ans=0.125 2023-11-25 20:34:02,986 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 457550 2023-11-25 20:34:08,176 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 650, loss[loss=0.05652, simple_loss=0.07739, pruned_loss=0.009318, audio_tagging_loss=0.008508, over 14324.00 frames. ], tot_loss[loss=0.07006, simple_loss=0.09144, pruned_loss=0.01288, audio_tagging_loss=0.01146, over 2929561.86 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:34:15,173 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.11 vs. limit=15.0 2023-11-25 20:34:28,241 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3050420.0, ans=0.2 2023-11-25 20:34:31,123 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.625e+01 8.926e+01 9.451e+01 1.004e+02 1.811e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-25 20:34:33,677 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.09 vs. limit=15.0 2023-11-25 20:34:34,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3050486.6666666665, ans=0.125 2023-11-25 20:34:47,718 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=3050553.3333333335, ans=22.5 2023-11-25 20:34:52,548 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3050620.0, ans=0.0 2023-11-25 20:34:56,695 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 457600 2023-11-25 20:34:59,902 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.55 vs. limit=15.0 2023-11-25 20:35:02,819 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 700, loss[loss=0.07372, simple_loss=0.1013, pruned_loss=0.0133, audio_tagging_loss=0.009788, over 16527.00 frames. ], tot_loss[loss=0.06931, simple_loss=0.09073, pruned_loss=0.01272, audio_tagging_loss=0.01122, over 2961079.00 frames. ], batch size: 60, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:35:08,098 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.84 vs. limit=6.0 2023-11-25 20:35:23,191 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3050820.0, ans=0.2 2023-11-25 20:35:24,733 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-25 20:35:24,947 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.38 vs. limit=15.0 2023-11-25 20:35:29,948 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3050820.0, ans=0.0 2023-11-25 20:35:43,944 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3050886.6666666665, ans=0.2 2023-11-25 20:35:52,122 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 457650 2023-11-25 20:35:57,279 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 750, loss[loss=0.06485, simple_loss=0.08702, pruned_loss=0.0115, audio_tagging_loss=0.009839, over 15854.00 frames. ], tot_loss[loss=0.06886, simple_loss=0.09015, pruned_loss=0.01275, audio_tagging_loss=0.01103, over 2985279.35 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:36:16,648 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3051086.6666666665, ans=0.1 2023-11-25 20:36:19,683 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.528e+01 8.878e+01 9.438e+01 1.006e+02 1.228e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-25 20:36:32,467 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3051220.0, ans=0.0 2023-11-25 20:36:45,752 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 457700 2023-11-25 20:36:46,002 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3051286.6666666665, ans=0.125 2023-11-25 20:36:51,357 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 800, loss[loss=0.0738, simple_loss=0.09702, pruned_loss=0.01565, audio_tagging_loss=0.009641, over 14442.00 frames. ], tot_loss[loss=0.06914, simple_loss=0.0906, pruned_loss=0.01298, audio_tagging_loss=0.01086, over 2998126.38 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 20:36:56,661 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3051353.3333333335, ans=0.1 2023-11-25 20:37:05,031 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3051420.0, ans=0.2 2023-11-25 20:37:19,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3051486.6666666665, ans=0.1 2023-11-25 20:37:23,260 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3051553.3333333335, ans=0.1 2023-11-25 20:37:33,206 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3051553.3333333335, ans=0.125 2023-11-25 20:37:40,150 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 457750 2023-11-25 20:37:45,296 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 850, loss[loss=0.07625, simple_loss=0.1075, pruned_loss=0.01359, audio_tagging_loss=0.008894, over 15585.00 frames. ], tot_loss[loss=0.06867, simple_loss=0.0902, pruned_loss=0.0128, audio_tagging_loss=0.01077, over 3005604.34 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 20:38:04,347 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.21 vs. limit=10.0 2023-11-25 20:38:08,813 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.903e+01 8.812e+01 9.277e+01 9.996e+01 1.418e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-25 20:38:25,231 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3051886.6666666665, ans=0.125 2023-11-25 20:38:35,663 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 457800 2023-11-25 20:38:36,092 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.14 vs. limit=15.0 2023-11-25 20:38:41,242 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 900, loss[loss=0.04487, simple_loss=0.06086, pruned_loss=0.004746, audio_tagging_loss=0.009688, over 14423.00 frames. ], tot_loss[loss=0.06825, simple_loss=0.08958, pruned_loss=0.01259, audio_tagging_loss=0.01087, over 3021991.70 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 20:38:47,798 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3052020.0, ans=0.125 2023-11-25 20:38:52,996 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-25 20:39:06,863 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3052153.3333333335, ans=0.125 2023-11-25 20:39:13,801 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3052220.0, ans=0.125 2023-11-25 20:39:22,602 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3052220.0, ans=0.125 2023-11-25 20:39:27,072 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3052286.6666666665, ans=0.04949747468305833 2023-11-25 20:39:29,927 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 457850 2023-11-25 20:39:34,673 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.54 vs. limit=6.0 2023-11-25 20:39:35,117 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 950, loss[loss=0.09668, simple_loss=0.141, pruned_loss=0.01934, audio_tagging_loss=0.006851, over 14677.00 frames. ], tot_loss[loss=0.06842, simple_loss=0.09031, pruned_loss=0.01266, audio_tagging_loss=0.01061, over 3032557.09 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 20:39:43,111 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3052353.3333333335, ans=0.125 2023-11-25 20:39:54,144 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3052420.0, ans=0.125 2023-11-25 20:39:58,565 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.755e+01 8.948e+01 9.425e+01 1.001e+02 1.201e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-25 20:40:06,306 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.91 vs. limit=10.0 2023-11-25 20:40:08,048 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3052553.3333333335, ans=0.0 2023-11-25 20:40:24,109 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 457900 2023-11-25 20:40:24,297 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3052620.0, ans=0.1 2023-11-25 20:40:27,348 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3052620.0, ans=0.125 2023-11-25 20:40:29,243 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 1000, loss[loss=0.0752, simple_loss=0.09477, pruned_loss=0.01706, audio_tagging_loss=0.01075, over 14864.00 frames. ], tot_loss[loss=0.06857, simple_loss=0.09081, pruned_loss=0.01289, audio_tagging_loss=0.01027, over 3028059.60 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 20:40:52,827 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 20:41:03,776 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.26 vs. limit=12.0 2023-11-25 20:41:05,573 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3052886.6666666665, ans=0.125 2023-11-25 20:41:13,452 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3052953.3333333335, ans=0.0 2023-11-25 20:41:19,422 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 457950 2023-11-25 20:41:24,730 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 1050, loss[loss=0.07673, simple_loss=0.1005, pruned_loss=0.01752, audio_tagging_loss=0.008984, over 15893.00 frames. ], tot_loss[loss=0.06844, simple_loss=0.09104, pruned_loss=0.01288, audio_tagging_loss=0.01005, over 3022985.32 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:41:30,299 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3053020.0, ans=0.0 2023-11-25 20:41:35,464 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3053086.6666666665, ans=0.2 2023-11-25 20:41:48,834 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.327e+01 8.789e+01 9.440e+01 1.020e+02 1.231e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-25 20:41:53,816 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3053153.3333333335, ans=0.0 2023-11-25 20:41:54,868 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3053153.3333333335, ans=0.125 2023-11-25 20:42:02,203 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3053220.0, ans=0.1 2023-11-25 20:42:04,555 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.26 vs. limit=22.5 2023-11-25 20:42:08,777 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.14 vs. limit=15.0 2023-11-25 20:42:12,814 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3053286.6666666665, ans=0.0 2023-11-25 20:42:13,604 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 458000 2023-11-25 20:42:19,168 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 1100, loss[loss=0.06011, simple_loss=0.06827, pruned_loss=0.01344, audio_tagging_loss=0.01253, over 15478.00 frames. ], tot_loss[loss=0.06769, simple_loss=0.08993, pruned_loss=0.01277, audio_tagging_loss=0.00996, over 3027764.04 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 20:42:21,253 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 20:42:23,536 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3053353.3333333335, ans=0.125 2023-11-25 20:42:26,645 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3053353.3333333335, ans=0.0 2023-11-25 20:42:41,291 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3053486.6666666665, ans=0.0 2023-11-25 20:42:41,323 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3053486.6666666665, ans=0.125 2023-11-25 20:42:44,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3053486.6666666665, ans=0.0 2023-11-25 20:42:54,646 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.42 vs. limit=6.0 2023-11-25 20:43:07,635 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 458050 2023-11-25 20:43:12,715 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 1150, loss[loss=0.07974, simple_loss=0.1223, pruned_loss=0.01309, audio_tagging_loss=0.005503, over 14691.00 frames. ], tot_loss[loss=0.06778, simple_loss=0.0906, pruned_loss=0.01271, audio_tagging_loss=0.009762, over 3029951.02 frames. ], batch size: 53, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 20:43:22,232 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3053686.6666666665, ans=0.0 2023-11-25 20:43:26,507 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3053753.3333333335, ans=0.125 2023-11-25 20:43:38,708 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.982e+01 8.733e+01 9.355e+01 1.009e+02 1.328e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-25 20:43:40,298 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.41 vs. limit=15.0 2023-11-25 20:43:44,078 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3053820.0, ans=0.07 2023-11-25 20:43:47,264 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3053886.6666666665, ans=0.07 2023-11-25 20:43:52,344 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3053886.6666666665, ans=0.125 2023-11-25 20:44:02,218 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 458100 2023-11-25 20:44:04,409 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.79 vs. limit=15.0 2023-11-25 20:44:08,160 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3054020.0, ans=0.0 2023-11-25 20:44:08,912 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 1200, loss[loss=0.07076, simple_loss=0.09947, pruned_loss=0.01255, audio_tagging_loss=0.008471, over 15501.00 frames. ], tot_loss[loss=0.06801, simple_loss=0.09105, pruned_loss=0.01284, audio_tagging_loss=0.009647, over 3028804.37 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:44:16,377 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3054020.0, ans=0.1 2023-11-25 20:44:38,624 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3054153.3333333335, ans=0.125 2023-11-25 20:44:57,726 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 458150 2023-11-25 20:45:02,010 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3054353.3333333335, ans=0.125 2023-11-25 20:45:02,922 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 1250, loss[loss=0.06625, simple_loss=0.09768, pruned_loss=0.008375, audio_tagging_loss=0.009039, over 15453.00 frames. ], tot_loss[loss=0.06838, simple_loss=0.09157, pruned_loss=0.01301, audio_tagging_loss=0.009585, over 3034560.40 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:45:11,418 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3054353.3333333335, ans=0.1 2023-11-25 20:45:14,588 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3054420.0, ans=0.0 2023-11-25 20:45:24,791 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.76 vs. limit=12.0 2023-11-25 20:45:27,924 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.972e+01 9.172e+01 9.748e+01 1.035e+02 1.279e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-25 20:45:28,195 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3054486.6666666665, ans=0.125 2023-11-25 20:45:41,287 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3054553.3333333335, ans=0.2 2023-11-25 20:45:51,544 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 458200 2023-11-25 20:45:57,094 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 1300, loss[loss=0.06525, simple_loss=0.08595, pruned_loss=0.01214, audio_tagging_loss=0.01014, over 16120.00 frames. ], tot_loss[loss=0.06784, simple_loss=0.0908, pruned_loss=0.01275, audio_tagging_loss=0.009687, over 3036802.39 frames. ], batch size: 61, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:46:19,685 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=3054820.0, ans=0.95 2023-11-25 20:46:20,840 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3054820.0, ans=0.0 2023-11-25 20:46:38,457 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3054886.6666666665, ans=0.125 2023-11-25 20:46:45,816 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 458250 2023-11-25 20:46:52,099 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 1350, loss[loss=0.05876, simple_loss=0.06674, pruned_loss=0.01429, audio_tagging_loss=0.0111, over 13921.00 frames. ], tot_loss[loss=0.06768, simple_loss=0.09053, pruned_loss=0.01281, audio_tagging_loss=0.009604, over 3036016.43 frames. ], batch size: 53, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:47:02,276 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3055086.6666666665, ans=0.125 2023-11-25 20:47:06,520 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3055086.6666666665, ans=0.125 2023-11-25 20:47:16,591 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.563e+01 8.731e+01 9.396e+01 1.007e+02 1.248e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-25 20:47:18,886 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3055153.3333333335, ans=0.0 2023-11-25 20:47:28,307 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3055220.0, ans=0.1 2023-11-25 20:47:29,780 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.71 vs. limit=10.0 2023-11-25 20:47:31,807 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 20:47:40,257 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3055286.6666666665, ans=0.125 2023-11-25 20:47:41,147 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 458300 2023-11-25 20:47:46,332 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 1400, loss[loss=0.07773, simple_loss=0.09404, pruned_loss=0.0193, audio_tagging_loss=0.01141, over 14677.00 frames. ], tot_loss[loss=0.06792, simple_loss=0.09073, pruned_loss=0.01288, audio_tagging_loss=0.009674, over 3042286.45 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:48:04,700 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.77 vs. limit=12.0 2023-11-25 20:48:07,430 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3055486.6666666665, ans=0.2 2023-11-25 20:48:19,683 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.40 vs. limit=15.0 2023-11-25 20:48:24,578 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3055553.3333333335, ans=0.95 2023-11-25 20:48:35,024 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 458350 2023-11-25 20:48:40,190 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 1450, loss[loss=0.08058, simple_loss=0.1121, pruned_loss=0.01687, audio_tagging_loss=0.007677, over 15319.00 frames. ], tot_loss[loss=0.06776, simple_loss=0.09039, pruned_loss=0.01279, audio_tagging_loss=0.009775, over 3047342.74 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:48:49,743 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3055753.3333333335, ans=0.125 2023-11-25 20:48:51,174 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.16 vs. limit=15.0 2023-11-25 20:48:57,021 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 20:49:04,277 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.23 vs. limit=6.0 2023-11-25 20:49:05,880 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.455e+01 8.671e+01 9.348e+01 1.019e+02 1.564e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-25 20:49:18,850 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.67 vs. limit=15.0 2023-11-25 20:49:28,714 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 458400 2023-11-25 20:49:34,839 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 1500, loss[loss=0.06546, simple_loss=0.08809, pruned_loss=0.01181, audio_tagging_loss=0.009608, over 15969.00 frames. ], tot_loss[loss=0.06786, simple_loss=0.09056, pruned_loss=0.0128, audio_tagging_loss=0.009782, over 3044625.59 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:50:01,586 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3056153.3333333335, ans=0.125 2023-11-25 20:50:03,585 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3056153.3333333335, ans=0.0 2023-11-25 20:50:20,545 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.53 vs. limit=15.0 2023-11-25 20:50:25,395 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 458450 2023-11-25 20:50:30,452 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 1550, loss[loss=0.0777, simple_loss=0.1058, pruned_loss=0.01736, audio_tagging_loss=0.007429, over 15177.00 frames. ], tot_loss[loss=0.06745, simple_loss=0.08982, pruned_loss=0.0127, audio_tagging_loss=0.00984, over 3039811.98 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:50:32,692 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3056353.3333333335, ans=0.125 2023-11-25 20:50:54,350 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.420e+01 8.789e+01 9.295e+01 1.002e+02 1.264e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-25 20:51:19,482 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 458500 2023-11-25 20:51:24,687 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 1600, loss[loss=0.06066, simple_loss=0.08044, pruned_loss=0.00917, audio_tagging_loss=0.01127, over 16102.00 frames. ], tot_loss[loss=0.0676, simple_loss=0.08995, pruned_loss=0.01275, audio_tagging_loss=0.009877, over 3039749.16 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 20:51:38,566 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.97 vs. limit=15.0 2023-11-25 20:51:43,598 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3056753.3333333335, ans=0.125 2023-11-25 20:51:54,273 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.74 vs. limit=15.0 2023-11-25 20:52:02,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3056886.6666666665, ans=0.0 2023-11-25 20:52:13,788 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 458550 2023-11-25 20:52:18,975 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 1650, loss[loss=0.07165, simple_loss=0.09016, pruned_loss=0.01344, audio_tagging_loss=0.01314, over 14672.00 frames. ], tot_loss[loss=0.06804, simple_loss=0.09044, pruned_loss=0.01297, audio_tagging_loss=0.009846, over 3036582.56 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 20:52:21,847 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3057020.0, ans=0.0 2023-11-25 20:52:27,932 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3057020.0, ans=0.07 2023-11-25 20:52:29,024 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3057020.0, ans=0.125 2023-11-25 20:52:30,362 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.73 vs. limit=15.0 2023-11-25 20:52:44,994 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.639e+01 8.829e+01 9.450e+01 1.011e+02 1.260e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-25 20:52:46,308 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3057153.3333333335, ans=0.125 2023-11-25 20:53:04,686 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3057286.6666666665, ans=0.0 2023-11-25 20:53:09,762 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 458600 2023-11-25 20:53:15,266 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 1700, loss[loss=0.05992, simple_loss=0.08057, pruned_loss=0.01123, audio_tagging_loss=0.008406, over 15402.00 frames. ], tot_loss[loss=0.06807, simple_loss=0.09065, pruned_loss=0.01297, audio_tagging_loss=0.009775, over 3035303.18 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:53:28,810 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.98 vs. limit=12.0 2023-11-25 20:54:05,015 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 458650 2023-11-25 20:54:09,345 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-25 20:54:10,178 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 1750, loss[loss=0.08462, simple_loss=0.1175, pruned_loss=0.01765, audio_tagging_loss=0.008229, over 15757.00 frames. ], tot_loss[loss=0.06795, simple_loss=0.09044, pruned_loss=0.01304, audio_tagging_loss=0.009691, over 3041714.12 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:54:11,314 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=3057686.6666666665, ans=0.05 2023-11-25 20:54:13,845 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.45 vs. limit=10.0 2023-11-25 20:54:16,574 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3057686.6666666665, ans=0.125 2023-11-25 20:54:36,741 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.715e+01 8.599e+01 9.189e+01 9.882e+01 1.189e+02, threshold=1.838e+02, percent-clipped=0.0 2023-11-25 20:54:37,047 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3057820.0, ans=0.1 2023-11-25 20:54:45,195 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.58 vs. limit=15.0 2023-11-25 20:54:59,278 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 458700 2023-11-25 20:55:02,588 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3057953.3333333335, ans=0.125 2023-11-25 20:55:04,395 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 1800, loss[loss=0.06625, simple_loss=0.08518, pruned_loss=0.01245, audio_tagging_loss=0.01121, over 13598.00 frames. ], tot_loss[loss=0.0677, simple_loss=0.09058, pruned_loss=0.0129, audio_tagging_loss=0.009515, over 3045123.48 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:55:20,909 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-25 20:55:28,543 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3058153.3333333335, ans=0.1 2023-11-25 20:55:29,646 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3058153.3333333335, ans=0.2 2023-11-25 20:55:29,734 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3058153.3333333335, ans=0.1 2023-11-25 20:55:35,905 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3058153.3333333335, ans=0.1 2023-11-25 20:55:42,699 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.58 vs. limit=22.5 2023-11-25 20:55:54,760 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 458750 2023-11-25 20:56:00,478 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 1850, loss[loss=0.06565, simple_loss=0.08933, pruned_loss=0.009605, audio_tagging_loss=0.01138, over 14623.00 frames. ], tot_loss[loss=0.06707, simple_loss=0.08971, pruned_loss=0.0127, audio_tagging_loss=0.009513, over 3038275.48 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:56:17,050 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3058420.0, ans=0.1 2023-11-25 20:56:17,353 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.68 vs. limit=15.0 2023-11-25 20:56:26,238 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.929e+01 8.557e+01 9.640e+01 1.041e+02 1.665e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-25 20:56:27,470 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3058486.6666666665, ans=0.125 2023-11-25 20:56:49,841 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 458800 2023-11-25 20:56:50,242 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.19 vs. limit=15.0 2023-11-25 20:56:55,913 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 1900, loss[loss=0.06218, simple_loss=0.0845, pruned_loss=0.009066, audio_tagging_loss=0.01087, over 16640.00 frames. ], tot_loss[loss=0.06743, simple_loss=0.0903, pruned_loss=0.01284, audio_tagging_loss=0.009437, over 3037921.84 frames. ], batch size: 62, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:56:58,289 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3058686.6666666665, ans=0.125 2023-11-25 20:57:36,138 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.24 vs. limit=15.0 2023-11-25 20:57:38,287 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.17 vs. limit=22.5 2023-11-25 20:57:39,989 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff3.min_abs, batch_count=3058953.3333333335, ans=0.2 2023-11-25 20:57:44,974 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 458850 2023-11-25 20:57:50,114 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 1950, loss[loss=0.07935, simple_loss=0.1041, pruned_loss=0.0191, audio_tagging_loss=0.008188, over 15552.00 frames. ], tot_loss[loss=0.06733, simple_loss=0.09018, pruned_loss=0.01281, audio_tagging_loss=0.009431, over 3037659.63 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:57:53,385 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 20:57:57,498 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.92 vs. limit=15.0 2023-11-25 20:58:08,941 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.03 vs. limit=6.0 2023-11-25 20:58:16,758 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.741e+01 8.855e+01 9.248e+01 9.928e+01 1.852e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-25 20:58:19,619 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3059153.3333333335, ans=0.1 2023-11-25 20:58:30,043 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3059220.0, ans=10.0 2023-11-25 20:58:39,957 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 458900 2023-11-25 20:58:43,716 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3059286.6666666665, ans=0.0 2023-11-25 20:58:46,043 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 2000, loss[loss=0.0672, simple_loss=0.09658, pruned_loss=0.009613, audio_tagging_loss=0.009296, over 14922.00 frames. ], tot_loss[loss=0.06753, simple_loss=0.09041, pruned_loss=0.01293, audio_tagging_loss=0.009395, over 3041404.26 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 20:59:07,712 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3059486.6666666665, ans=0.0 2023-11-25 20:59:35,130 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 458950 2023-11-25 20:59:40,298 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 2050, loss[loss=0.06812, simple_loss=0.09126, pruned_loss=0.01399, audio_tagging_loss=0.008503, over 15207.00 frames. ], tot_loss[loss=0.06807, simple_loss=0.09129, pruned_loss=0.01311, audio_tagging_loss=0.009312, over 3041131.32 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:00:00,046 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3059753.3333333335, ans=0.125 2023-11-25 21:00:08,235 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.735e+01 8.675e+01 9.370e+01 9.808e+01 1.405e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-25 21:00:13,736 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3059886.6666666665, ans=0.125 2023-11-25 21:00:29,817 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 459000 2023-11-25 21:00:29,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3059953.3333333335, ans=0.1 2023-11-25 21:00:35,283 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 2100, loss[loss=0.05549, simple_loss=0.06989, pruned_loss=0.00894, audio_tagging_loss=0.01161, over 14444.00 frames. ], tot_loss[loss=0.06751, simple_loss=0.09046, pruned_loss=0.01294, audio_tagging_loss=0.009337, over 3038674.82 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:00:36,824 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.09 vs. limit=15.0 2023-11-25 21:00:45,403 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3060086.6666666665, ans=0.125 2023-11-25 21:01:00,428 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3060153.3333333335, ans=0.125 2023-11-25 21:01:07,869 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3060220.0, ans=0.125 2023-11-25 21:01:22,060 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3060286.6666666665, ans=0.0 2023-11-25 21:01:24,399 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 459050 2023-11-25 21:01:30,069 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 2150, loss[loss=0.05245, simple_loss=0.07567, pruned_loss=0.005788, audio_tagging_loss=0.00883, over 15819.00 frames. ], tot_loss[loss=0.0672, simple_loss=0.09021, pruned_loss=0.01278, audio_tagging_loss=0.009309, over 3036327.77 frames. ], batch size: 62, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 21:01:35,219 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.54 vs. limit=12.0 2023-11-25 21:01:41,291 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3060420.0, ans=0.025 2023-11-25 21:01:57,770 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.80 vs. limit=15.0 2023-11-25 21:01:58,071 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.190e+01 8.469e+01 9.111e+01 9.697e+01 1.371e+02, threshold=1.822e+02, percent-clipped=0.0 2023-11-25 21:02:03,359 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 21:02:14,473 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3060620.0, ans=0.125 2023-11-25 21:02:19,659 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 459100 2023-11-25 21:02:24,875 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 2200, loss[loss=0.07234, simple_loss=0.08777, pruned_loss=0.01713, audio_tagging_loss=0.01132, over 14744.00 frames. ], tot_loss[loss=0.06778, simple_loss=0.09112, pruned_loss=0.0129, audio_tagging_loss=0.009322, over 3039964.43 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 21:02:31,330 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3060686.6666666665, ans=0.125 2023-11-25 21:02:35,545 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3060753.3333333335, ans=0.125 2023-11-25 21:02:41,585 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.05 vs. limit=15.0 2023-11-25 21:02:46,025 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.13 vs. limit=12.0 2023-11-25 21:02:52,218 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3060820.0, ans=0.125 2023-11-25 21:03:00,009 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3060886.6666666665, ans=0.125 2023-11-25 21:03:13,468 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 459150 2023-11-25 21:03:13,622 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3060953.3333333335, ans=0.0 2023-11-25 21:03:15,664 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3060953.3333333335, ans=0.035 2023-11-25 21:03:18,636 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 2250, loss[loss=0.06883, simple_loss=0.08977, pruned_loss=0.01581, audio_tagging_loss=0.008137, over 15672.00 frames. ], tot_loss[loss=0.06828, simple_loss=0.09183, pruned_loss=0.01304, audio_tagging_loss=0.009317, over 3036554.95 frames. ], batch size: 60, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 21:03:19,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3061020.0, ans=0.0 2023-11-25 21:03:28,761 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3061086.6666666665, ans=0.1 2023-11-25 21:03:32,818 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.83 vs. limit=10.0 2023-11-25 21:03:37,520 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.98 vs. limit=12.0 2023-11-25 21:03:47,473 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.267e+01 8.734e+01 9.500e+01 1.033e+02 1.214e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-25 21:03:49,812 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3061153.3333333335, ans=0.09899494936611666 2023-11-25 21:04:00,181 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3061220.0, ans=0.125 2023-11-25 21:04:05,678 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3061286.6666666665, ans=0.125 2023-11-25 21:04:05,682 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3061286.6666666665, ans=0.125 2023-11-25 21:04:07,700 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 459200 2023-11-25 21:04:09,446 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3061286.6666666665, ans=0.0 2023-11-25 21:04:09,494 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3061286.6666666665, ans=0.07 2023-11-25 21:04:12,002 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3061286.6666666665, ans=0.125 2023-11-25 21:04:13,873 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 2300, loss[loss=0.08593, simple_loss=0.1164, pruned_loss=0.01879, audio_tagging_loss=0.00893, over 15512.00 frames. ], tot_loss[loss=0.06824, simple_loss=0.09165, pruned_loss=0.01301, audio_tagging_loss=0.009404, over 3037058.56 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 21:04:18,094 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.40 vs. limit=22.5 2023-11-25 21:04:37,384 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3061486.6666666665, ans=0.0 2023-11-25 21:04:41,531 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3061486.6666666665, ans=0.0 2023-11-25 21:04:43,642 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3061486.6666666665, ans=0.125 2023-11-25 21:04:48,282 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3061553.3333333335, ans=0.125 2023-11-25 21:05:03,121 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 21:05:03,164 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 459250 2023-11-25 21:05:08,344 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 2350, loss[loss=0.06455, simple_loss=0.07644, pruned_loss=0.01305, audio_tagging_loss=0.01328, over 14498.00 frames. ], tot_loss[loss=0.06842, simple_loss=0.09204, pruned_loss=0.01302, audio_tagging_loss=0.009381, over 3042402.99 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 21:05:10,662 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3061686.6666666665, ans=0.0 2023-11-25 21:05:19,009 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3061753.3333333335, ans=0.2 2023-11-25 21:05:36,647 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.495e+01 8.670e+01 9.358e+01 1.017e+02 1.318e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-25 21:05:41,495 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3061886.6666666665, ans=0.1 2023-11-25 21:05:57,202 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 459300 2023-11-25 21:05:58,399 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3061953.3333333335, ans=0.125 2023-11-25 21:06:02,360 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 2400, loss[loss=0.07743, simple_loss=0.09847, pruned_loss=0.01879, audio_tagging_loss=0.009399, over 16349.00 frames. ], tot_loss[loss=0.06768, simple_loss=0.09053, pruned_loss=0.01284, audio_tagging_loss=0.009579, over 3042027.22 frames. ], batch size: 64, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:06:11,915 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3062086.6666666665, ans=0.125 2023-11-25 21:06:19,260 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3062086.6666666665, ans=0.125 2023-11-25 21:06:20,315 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3062086.6666666665, ans=0.2 2023-11-25 21:06:22,962 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3062086.6666666665, ans=0.0 2023-11-25 21:06:24,050 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3062153.3333333335, ans=0.125 2023-11-25 21:06:28,100 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3062153.3333333335, ans=0.125 2023-11-25 21:06:29,482 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.10 vs. limit=10.0 2023-11-25 21:06:31,341 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.28 vs. limit=22.5 2023-11-25 21:06:50,892 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 459350 2023-11-25 21:06:56,593 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 2450, loss[loss=0.0529, simple_loss=0.06645, pruned_loss=0.009693, audio_tagging_loss=0.009978, over 15451.00 frames. ], tot_loss[loss=0.06774, simple_loss=0.09061, pruned_loss=0.01281, audio_tagging_loss=0.009624, over 3047051.35 frames. ], batch size: 60, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:07:24,407 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.498e+01 8.490e+01 9.348e+01 1.014e+02 1.568e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-25 21:07:29,842 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3062553.3333333335, ans=0.0 2023-11-25 21:07:32,981 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3062553.3333333335, ans=0.125 2023-11-25 21:07:41,323 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3062620.0, ans=0.0 2023-11-25 21:07:45,842 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 459400 2023-11-25 21:07:51,343 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 2500, loss[loss=0.05326, simple_loss=0.05277, pruned_loss=0.00951, audio_tagging_loss=0.01737, over 13481.00 frames. ], tot_loss[loss=0.06816, simple_loss=0.09131, pruned_loss=0.01288, audio_tagging_loss=0.009624, over 3055863.54 frames. ], batch size: 53, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:07:53,532 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3062686.6666666665, ans=0.0 2023-11-25 21:07:58,812 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3062686.6666666665, ans=0.125 2023-11-25 21:07:59,838 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3062686.6666666665, ans=0.125 2023-11-25 21:08:06,213 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3062753.3333333335, ans=0.125 2023-11-25 21:08:08,238 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3062753.3333333335, ans=0.0 2023-11-25 21:08:10,301 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3062753.3333333335, ans=0.0 2023-11-25 21:08:14,443 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3062820.0, ans=0.0 2023-11-25 21:08:35,064 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.18 vs. limit=15.0 2023-11-25 21:08:39,826 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 459450 2023-11-25 21:08:44,902 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 2550, loss[loss=0.06094, simple_loss=0.08342, pruned_loss=0.0104, audio_tagging_loss=0.008826, over 15284.00 frames. ], tot_loss[loss=0.06755, simple_loss=0.09046, pruned_loss=0.01272, audio_tagging_loss=0.009595, over 3049333.69 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:09:12,408 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3063153.3333333335, ans=0.125 2023-11-25 21:09:13,334 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.057e+01 8.639e+01 9.480e+01 1.019e+02 1.523e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-25 21:09:31,721 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.63 vs. limit=15.0 2023-11-25 21:09:33,262 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 459500 2023-11-25 21:09:37,575 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3063353.3333333335, ans=0.1 2023-11-25 21:09:38,364 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 2600, loss[loss=0.04619, simple_loss=0.06235, pruned_loss=0.006983, audio_tagging_loss=0.008031, over 15288.00 frames. ], tot_loss[loss=0.06678, simple_loss=0.08957, pruned_loss=0.01253, audio_tagging_loss=0.009462, over 3054497.62 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:09:42,212 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3063353.3333333335, ans=0.0 2023-11-25 21:09:42,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3063353.3333333335, ans=0.09899494936611666 2023-11-25 21:09:42,586 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.85 vs. limit=6.0 2023-11-25 21:10:14,297 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3063553.3333333335, ans=0.0 2023-11-25 21:10:25,282 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3063620.0, ans=0.125 2023-11-25 21:10:28,899 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 459550 2023-11-25 21:10:32,068 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3063620.0, ans=0.125 2023-11-25 21:10:34,010 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 2650, loss[loss=0.06444, simple_loss=0.0827, pruned_loss=0.0105, audio_tagging_loss=0.0126, over 14694.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.08877, pruned_loss=0.01223, audio_tagging_loss=0.009421, over 3052591.67 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:10:49,947 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3063753.3333333335, ans=0.125 2023-11-25 21:10:50,369 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.46 vs. limit=22.5 2023-11-25 21:10:54,125 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3063820.0, ans=0.1 2023-11-25 21:11:01,168 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.873e+01 8.401e+01 9.228e+01 9.795e+01 1.294e+02, threshold=1.846e+02, percent-clipped=0.0 2023-11-25 21:11:07,657 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.41 vs. limit=22.5 2023-11-25 21:11:16,472 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3063953.3333333335, ans=0.125 2023-11-25 21:11:22,695 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 459600 2023-11-25 21:11:26,462 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3063953.3333333335, ans=0.1 2023-11-25 21:11:28,313 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 2700, loss[loss=0.06604, simple_loss=0.08463, pruned_loss=0.01584, audio_tagging_loss=0.007881, over 15831.00 frames. ], tot_loss[loss=0.0664, simple_loss=0.08927, pruned_loss=0.01243, audio_tagging_loss=0.009335, over 3055407.89 frames. ], batch size: 61, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:11:29,613 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3064020.0, ans=0.125 2023-11-25 21:11:31,640 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3064020.0, ans=0.1 2023-11-25 21:11:33,196 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.52 vs. limit=6.0 2023-11-25 21:11:55,262 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3064153.3333333335, ans=0.95 2023-11-25 21:12:01,397 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3064220.0, ans=0.125 2023-11-25 21:12:11,929 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=3064286.6666666665, ans=10.0 2023-11-25 21:12:12,626 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3064286.6666666665, ans=0.1 2023-11-25 21:12:13,685 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-25 21:12:16,711 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 459650 2023-11-25 21:12:19,964 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3064286.6666666665, ans=0.2 2023-11-25 21:12:20,136 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.45 vs. limit=22.5 2023-11-25 21:12:21,807 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 2750, loss[loss=0.05921, simple_loss=0.07768, pruned_loss=0.01084, audio_tagging_loss=0.009526, over 15803.00 frames. ], tot_loss[loss=0.06596, simple_loss=0.0888, pruned_loss=0.01228, audio_tagging_loss=0.009273, over 3054573.93 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:12:36,315 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3064420.0, ans=0.07 2023-11-25 21:12:41,601 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3064420.0, ans=0.125 2023-11-25 21:12:44,760 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3064486.6666666665, ans=0.125 2023-11-25 21:12:50,729 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.974e+01 8.675e+01 9.044e+01 9.844e+01 1.238e+02, threshold=1.809e+02, percent-clipped=0.0 2023-11-25 21:13:08,989 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 21:13:11,082 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 459700 2023-11-25 21:13:17,190 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 2800, loss[loss=0.06823, simple_loss=0.09407, pruned_loss=0.009989, audio_tagging_loss=0.01121, over 15443.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.0896, pruned_loss=0.01243, audio_tagging_loss=0.00925, over 3063223.76 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:13:27,400 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3064753.3333333335, ans=0.125 2023-11-25 21:13:28,361 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3064753.3333333335, ans=0.125 2023-11-25 21:13:28,571 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=3064753.3333333335, ans=22.5 2023-11-25 21:13:33,658 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3064753.3333333335, ans=0.04949747468305833 2023-11-25 21:13:37,771 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3064820.0, ans=0.0 2023-11-25 21:13:45,111 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3064820.0, ans=0.125 2023-11-25 21:14:03,910 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3064953.3333333335, ans=0.1 2023-11-25 21:14:06,787 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 459750 2023-11-25 21:14:08,038 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3064953.3333333335, ans=0.125 2023-11-25 21:14:08,329 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.94 vs. limit=10.0 2023-11-25 21:14:11,865 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 2850, loss[loss=0.04978, simple_loss=0.06629, pruned_loss=0.008474, audio_tagging_loss=0.008163, over 15245.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.08872, pruned_loss=0.01226, audio_tagging_loss=0.009364, over 3052081.46 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:14:34,865 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.76 vs. limit=12.0 2023-11-25 21:14:39,097 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3065153.3333333335, ans=0.0 2023-11-25 21:14:39,979 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.345e+01 8.595e+01 9.142e+01 9.903e+01 1.163e+02, threshold=1.828e+02, percent-clipped=0.0 2023-11-25 21:14:56,194 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3065286.6666666665, ans=0.125 2023-11-25 21:14:57,328 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3065286.6666666665, ans=0.0 2023-11-25 21:15:00,260 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 459800 2023-11-25 21:15:05,816 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 2900, loss[loss=0.05677, simple_loss=0.07793, pruned_loss=0.00912, audio_tagging_loss=0.008682, over 14633.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.08953, pruned_loss=0.01234, audio_tagging_loss=0.009215, over 3048405.21 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:15:19,808 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.72 vs. limit=15.0 2023-11-25 21:15:21,431 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3065420.0, ans=0.04949747468305833 2023-11-25 21:15:33,420 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3065486.6666666665, ans=0.125 2023-11-25 21:15:39,583 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3065553.3333333335, ans=0.125 2023-11-25 21:15:49,375 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.19 vs. limit=22.5 2023-11-25 21:15:54,715 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 459850 2023-11-25 21:15:54,910 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 21:16:00,336 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 2950, loss[loss=0.06238, simple_loss=0.08395, pruned_loss=0.009449, audio_tagging_loss=0.01095, over 14650.00 frames. ], tot_loss[loss=0.06691, simple_loss=0.0902, pruned_loss=0.01257, audio_tagging_loss=0.009243, over 3052493.17 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:16:06,808 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3065686.6666666665, ans=0.09899494936611666 2023-11-25 21:16:13,525 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3065753.3333333335, ans=0.125 2023-11-25 21:16:15,643 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3065753.3333333335, ans=0.0 2023-11-25 21:16:15,707 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-25 21:16:17,829 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3065753.3333333335, ans=0.0 2023-11-25 21:16:17,830 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3065753.3333333335, ans=0.0 2023-11-25 21:16:19,006 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.83 vs. limit=15.0 2023-11-25 21:16:25,125 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3065820.0, ans=0.125 2023-11-25 21:16:27,995 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.331e+01 8.766e+01 9.471e+01 1.038e+02 1.516e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-25 21:16:32,498 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3065886.6666666665, ans=0.125 2023-11-25 21:16:37,156 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3065886.6666666665, ans=0.0 2023-11-25 21:16:48,239 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3065953.3333333335, ans=0.125 2023-11-25 21:16:49,102 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 459900 2023-11-25 21:16:49,304 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3065953.3333333335, ans=0.0 2023-11-25 21:16:54,790 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 3000, loss[loss=0.06342, simple_loss=0.08341, pruned_loss=0.01235, audio_tagging_loss=0.00936, over 15388.00 frames. ], tot_loss[loss=0.06717, simple_loss=0.09043, pruned_loss=0.01265, audio_tagging_loss=0.00931, over 3054369.81 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:16:54,792 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-25 21:17:26,412 INFO [train_asr.py:1267] (0/4) Epoch 39, validation: loss=0.05939, simple_loss=0.05076, pruned_loss=0.005254, audio_tagging_loss=0.02875, over 4681554.00 frames. 2023-11-25 21:17:26,413 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-25 21:17:27,733 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3066020.0, ans=0.5 2023-11-25 21:17:45,993 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.65 vs. limit=15.0 2023-11-25 21:17:46,061 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.81 vs. limit=15.0 2023-11-25 21:17:50,432 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-25 21:17:53,952 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.99 vs. limit=22.5 2023-11-25 21:17:54,666 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3066153.3333333335, ans=0.2 2023-11-25 21:18:13,002 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3066286.6666666665, ans=0.125 2023-11-25 21:18:15,225 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3066286.6666666665, ans=0.0 2023-11-25 21:18:16,081 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 459950 2023-11-25 21:18:16,281 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3066286.6666666665, ans=0.1 2023-11-25 21:18:21,895 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 3050, loss[loss=0.07928, simple_loss=0.1056, pruned_loss=0.01754, audio_tagging_loss=0.008934, over 15583.00 frames. ], tot_loss[loss=0.06797, simple_loss=0.09155, pruned_loss=0.01286, audio_tagging_loss=0.009341, over 3059436.74 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:18:35,253 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3066420.0, ans=0.125 2023-11-25 21:18:42,149 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=14.36 vs. limit=15.0 2023-11-25 21:18:42,747 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3066486.6666666665, ans=0.125 2023-11-25 21:18:49,814 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.661e+01 8.558e+01 9.280e+01 1.009e+02 1.298e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-25 21:18:50,091 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3066486.6666666665, ans=0.125 2023-11-25 21:18:53,062 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 21:18:56,987 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3066553.3333333335, ans=0.125 2023-11-25 21:19:10,721 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.97 vs. limit=6.0 2023-11-25 21:19:11,074 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 460000 2023-11-25 21:19:12,552 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-460000.pt 2023-11-25 21:19:19,031 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 3100, loss[loss=0.06828, simple_loss=0.09613, pruned_loss=0.01142, audio_tagging_loss=0.008794, over 14919.00 frames. ], tot_loss[loss=0.06757, simple_loss=0.09067, pruned_loss=0.01273, audio_tagging_loss=0.009507, over 3054118.89 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:19:24,456 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3066686.6666666665, ans=0.125 2023-11-25 21:19:38,945 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.86 vs. limit=15.0 2023-11-25 21:20:03,917 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3066953.3333333335, ans=0.1 2023-11-25 21:20:07,921 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 460050 2023-11-25 21:20:11,527 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.63 vs. limit=22.5 2023-11-25 21:20:13,163 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 3150, loss[loss=0.04816, simple_loss=0.0635, pruned_loss=0.006213, audio_tagging_loss=0.0102, over 15278.00 frames. ], tot_loss[loss=0.06763, simple_loss=0.09057, pruned_loss=0.01275, audio_tagging_loss=0.009594, over 3052251.33 frames. ], batch size: 60, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:20:18,759 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-25 21:20:42,066 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.488e+01 8.732e+01 9.474e+01 1.004e+02 1.246e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-25 21:21:03,033 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 460100 2023-11-25 21:21:03,514 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.94 vs. limit=22.5 2023-11-25 21:21:09,290 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 3200, loss[loss=0.05962, simple_loss=0.07536, pruned_loss=0.01105, audio_tagging_loss=0.0109, over 15965.00 frames. ], tot_loss[loss=0.06791, simple_loss=0.09116, pruned_loss=0.01266, audio_tagging_loss=0.009665, over 3055241.16 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:21:18,280 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.28 vs. limit=15.0 2023-11-25 21:21:25,213 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3067420.0, ans=0.125 2023-11-25 21:21:36,666 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.42 vs. limit=15.0 2023-11-25 21:21:38,503 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3067486.6666666665, ans=0.125 2023-11-25 21:21:58,633 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 460150 2023-11-25 21:22:03,806 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 3250, loss[loss=0.05095, simple_loss=0.05177, pruned_loss=0.008877, audio_tagging_loss=0.01619, over 15476.00 frames. ], tot_loss[loss=0.06722, simple_loss=0.09024, pruned_loss=0.01243, audio_tagging_loss=0.009668, over 3055171.27 frames. ], batch size: 62, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:22:09,240 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3067686.6666666665, ans=0.125 2023-11-25 21:22:32,985 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.538e+01 8.616e+01 9.104e+01 1.013e+02 1.269e+02, threshold=1.821e+02, percent-clipped=0.0 2023-11-25 21:22:35,339 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-25 21:22:48,077 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 21:22:53,163 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 460200 2023-11-25 21:22:59,164 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 3300, loss[loss=0.0515, simple_loss=0.06977, pruned_loss=0.005339, audio_tagging_loss=0.01128, over 15594.00 frames. ], tot_loss[loss=0.06748, simple_loss=0.09039, pruned_loss=0.01253, audio_tagging_loss=0.009755, over 3049117.50 frames. ], batch size: 60, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:22:59,324 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3068020.0, ans=0.125 2023-11-25 21:23:03,465 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3068020.0, ans=0.125 2023-11-25 21:23:24,926 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3068153.3333333335, ans=0.2 2023-11-25 21:23:28,002 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3068153.3333333335, ans=0.125 2023-11-25 21:23:44,351 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3068286.6666666665, ans=0.05 2023-11-25 21:23:44,676 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.66 vs. limit=15.0 2023-11-25 21:23:48,263 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 460250 2023-11-25 21:23:54,406 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 3350, loss[loss=0.08149, simple_loss=0.1086, pruned_loss=0.01628, audio_tagging_loss=0.01091, over 14806.00 frames. ], tot_loss[loss=0.06736, simple_loss=0.09042, pruned_loss=0.01253, audio_tagging_loss=0.009618, over 3051319.42 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:24:22,946 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.162e+01 8.673e+01 9.369e+01 1.012e+02 1.333e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-25 21:24:30,209 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.37 vs. limit=5.0 2023-11-25 21:24:43,037 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.15 vs. limit=6.0 2023-11-25 21:24:44,759 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 460300 2023-11-25 21:24:48,149 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3068620.0, ans=0.0 2023-11-25 21:24:50,009 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 3400, loss[loss=0.074, simple_loss=0.1014, pruned_loss=0.01691, audio_tagging_loss=0.006383, over 16110.00 frames. ], tot_loss[loss=0.06788, simple_loss=0.09112, pruned_loss=0.01279, audio_tagging_loss=0.009527, over 3051057.73 frames. ], batch size: 61, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:25:01,739 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3068753.3333333335, ans=0.125 2023-11-25 21:25:06,319 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3068753.3333333335, ans=0.0 2023-11-25 21:25:30,040 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.46 vs. limit=22.5 2023-11-25 21:25:39,068 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 460350 2023-11-25 21:25:40,177 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3068953.3333333335, ans=0.2 2023-11-25 21:25:41,716 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.23 vs. limit=15.0 2023-11-25 21:25:44,250 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 3450, loss[loss=0.06098, simple_loss=0.08751, pruned_loss=0.009548, audio_tagging_loss=0.00768, over 15018.00 frames. ], tot_loss[loss=0.06743, simple_loss=0.09057, pruned_loss=0.01271, audio_tagging_loss=0.009426, over 3046184.88 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:25:55,075 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-25 21:25:59,902 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3069086.6666666665, ans=0.5 2023-11-25 21:26:02,553 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3069086.6666666665, ans=0.05 2023-11-25 21:26:13,688 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.466e+01 8.826e+01 9.469e+01 1.006e+02 1.325e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-25 21:26:19,285 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3069220.0, ans=0.0 2023-11-25 21:26:24,610 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3069220.0, ans=0.1 2023-11-25 21:26:29,191 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.55 vs. limit=6.0 2023-11-25 21:26:31,052 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3069286.6666666665, ans=0.1 2023-11-25 21:26:32,669 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3069286.6666666665, ans=0.125 2023-11-25 21:26:34,538 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 460400 2023-11-25 21:26:37,199 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-25 21:26:40,241 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 3500, loss[loss=0.06248, simple_loss=0.0827, pruned_loss=0.01185, audio_tagging_loss=0.009283, over 14689.00 frames. ], tot_loss[loss=0.06762, simple_loss=0.09081, pruned_loss=0.01289, audio_tagging_loss=0.009325, over 3054613.70 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:26:51,278 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.32 vs. limit=15.0 2023-11-25 21:27:04,752 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-25 21:27:08,732 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 21:27:09,982 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3069486.6666666665, ans=0.125 2023-11-25 21:27:30,901 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 460450 2023-11-25 21:27:36,114 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 3550, loss[loss=0.04854, simple_loss=0.06302, pruned_loss=0.008233, audio_tagging_loss=0.008796, over 14860.00 frames. ], tot_loss[loss=0.06726, simple_loss=0.0905, pruned_loss=0.01274, audio_tagging_loss=0.009268, over 3050997.81 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:27:58,614 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3069820.0, ans=0.125 2023-11-25 21:28:00,064 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3069820.0, ans=0.125 2023-11-25 21:28:01,362 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.19 vs. limit=12.0 2023-11-25 21:28:05,014 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.043e+01 8.535e+01 9.230e+01 9.860e+01 1.398e+02, threshold=1.846e+02, percent-clipped=0.0 2023-11-25 21:28:05,339 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3069820.0, ans=0.2 2023-11-25 21:28:20,809 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3069953.3333333335, ans=0.125 2023-11-25 21:28:24,824 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 460500 2023-11-25 21:28:26,037 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3069953.3333333335, ans=0.1 2023-11-25 21:28:30,004 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 3600, loss[loss=0.0539, simple_loss=0.06651, pruned_loss=0.01047, audio_tagging_loss=0.01017, over 14034.00 frames. ], tot_loss[loss=0.06708, simple_loss=0.09022, pruned_loss=0.01276, audio_tagging_loss=0.009213, over 3043094.83 frames. ], batch size: 53, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:28:56,586 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.80 vs. limit=6.0 2023-11-25 21:29:14,377 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.47 vs. limit=22.5 2023-11-25 21:29:19,216 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 460550 2023-11-25 21:29:24,871 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 3650, loss[loss=0.08324, simple_loss=0.1139, pruned_loss=0.01849, audio_tagging_loss=0.007815, over 14842.00 frames. ], tot_loss[loss=0.06752, simple_loss=0.09081, pruned_loss=0.01294, audio_tagging_loss=0.009176, over 3048517.52 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:29:39,194 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 21:29:40,709 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.44 vs. limit=22.5 2023-11-25 21:29:42,100 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3070420.0, ans=0.125 2023-11-25 21:29:42,229 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3070420.0, ans=0.0 2023-11-25 21:29:43,302 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3070420.0, ans=0.1 2023-11-25 21:29:54,665 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.402e+01 8.628e+01 9.158e+01 1.002e+02 1.364e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-25 21:30:15,074 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 460600 2023-11-25 21:30:18,684 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3070620.0, ans=0.125 2023-11-25 21:30:20,511 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 3700, loss[loss=0.06066, simple_loss=0.07019, pruned_loss=0.01472, audio_tagging_loss=0.01084, over 14323.00 frames. ], tot_loss[loss=0.06741, simple_loss=0.09053, pruned_loss=0.01297, audio_tagging_loss=0.009176, over 3045350.34 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:30:21,833 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3070686.6666666665, ans=0.5 2023-11-25 21:30:37,573 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3070753.3333333335, ans=0.125 2023-11-25 21:30:42,897 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3070820.0, ans=0.2 2023-11-25 21:30:44,907 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3070820.0, ans=0.1 2023-11-25 21:31:09,773 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 460650 2023-11-25 21:31:14,933 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 3750, loss[loss=0.06578, simple_loss=0.09359, pruned_loss=0.01018, audio_tagging_loss=0.008805, over 15819.00 frames. ], tot_loss[loss=0.06797, simple_loss=0.09143, pruned_loss=0.01313, audio_tagging_loss=0.009122, over 3046340.29 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:31:15,164 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3071020.0, ans=0.0 2023-11-25 21:31:45,430 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.367e+01 8.797e+01 9.429e+01 1.022e+02 1.345e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-25 21:31:48,067 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.86 vs. limit=15.0 2023-11-25 21:31:52,020 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3071220.0, ans=0.125 2023-11-25 21:31:53,789 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 21:31:57,080 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3071220.0, ans=0.2 2023-11-25 21:32:02,217 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3071286.6666666665, ans=0.2 2023-11-25 21:32:03,389 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3071286.6666666665, ans=0.0 2023-11-25 21:32:04,174 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 460700 2023-11-25 21:32:09,399 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 3800, loss[loss=0.07936, simple_loss=0.118, pruned_loss=0.01456, audio_tagging_loss=0.005775, over 15478.00 frames. ], tot_loss[loss=0.06772, simple_loss=0.09121, pruned_loss=0.01307, audio_tagging_loss=0.009043, over 3044490.99 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:32:28,571 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.78 vs. limit=15.0 2023-11-25 21:32:39,799 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3071486.6666666665, ans=0.2 2023-11-25 21:32:57,048 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3071620.0, ans=0.125 2023-11-25 21:32:59,694 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 460750 2023-11-25 21:33:05,936 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 3850, loss[loss=0.06246, simple_loss=0.09316, pruned_loss=0.01, audio_tagging_loss=0.005883, over 15155.00 frames. ], tot_loss[loss=0.06735, simple_loss=0.09062, pruned_loss=0.01289, audio_tagging_loss=0.009149, over 3046205.56 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:33:06,098 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3071686.6666666665, ans=0.125 2023-11-25 21:33:10,289 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3071686.6666666665, ans=0.125 2023-11-25 21:33:26,081 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-25 21:33:27,094 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3071820.0, ans=0.125 2023-11-25 21:33:34,156 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.205e+01 8.515e+01 9.072e+01 9.640e+01 1.260e+02, threshold=1.814e+02, percent-clipped=0.0 2023-11-25 21:33:38,046 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3071886.6666666665, ans=0.1 2023-11-25 21:33:40,786 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 21:33:53,388 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3071953.3333333335, ans=0.0 2023-11-25 21:33:55,450 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 460800 2023-11-25 21:34:00,980 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 3900, loss[loss=0.07588, simple_loss=0.0955, pruned_loss=0.01639, audio_tagging_loss=0.01175, over 16279.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.09, pruned_loss=0.01261, audio_tagging_loss=0.009249, over 3042807.53 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:34:37,754 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3072220.0, ans=0.125 2023-11-25 21:34:50,214 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 460850 2023-11-25 21:34:55,275 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 3950, loss[loss=0.06944, simple_loss=0.09246, pruned_loss=0.01269, audio_tagging_loss=0.01052, over 15034.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.08932, pruned_loss=0.01269, audio_tagging_loss=0.009362, over 3037508.80 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:34:58,965 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.45 vs. limit=15.0 2023-11-25 21:35:08,204 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-25 21:35:26,690 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.900e+01 8.593e+01 9.164e+01 9.900e+01 1.243e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-25 21:35:28,299 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.89 vs. limit=6.0 2023-11-25 21:35:33,325 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3072553.3333333335, ans=0.125 2023-11-25 21:35:41,725 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=3072620.0, ans=0.95 2023-11-25 21:35:45,358 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 460900 2023-11-25 21:35:51,040 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 4000, loss[loss=0.06896, simple_loss=0.08988, pruned_loss=0.01438, audio_tagging_loss=0.009645, over 16335.00 frames. ], tot_loss[loss=0.06695, simple_loss=0.08974, pruned_loss=0.01272, audio_tagging_loss=0.009362, over 3039983.25 frames. ], batch size: 61, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:36:06,094 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.91 vs. limit=15.0 2023-11-25 21:36:06,178 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.90 vs. limit=15.0 2023-11-25 21:36:20,708 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.15 vs. limit=6.0 2023-11-25 21:36:29,929 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-25 21:36:40,664 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 460950 2023-11-25 21:36:44,945 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.82 vs. limit=6.0 2023-11-25 21:36:46,418 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 4050, loss[loss=0.05203, simple_loss=0.06392, pruned_loss=0.007775, audio_tagging_loss=0.0123, over 15914.00 frames. ], tot_loss[loss=0.06718, simple_loss=0.09011, pruned_loss=0.01275, audio_tagging_loss=0.00937, over 3040958.80 frames. ], batch size: 60, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:36:46,609 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3073020.0, ans=0.0 2023-11-25 21:36:47,579 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-25 21:36:50,627 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 21:37:02,431 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3073086.6666666665, ans=0.0 2023-11-25 21:37:18,599 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3073220.0, ans=0.125 2023-11-25 21:37:18,666 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3073220.0, ans=0.125 2023-11-25 21:37:19,530 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.319e+01 8.880e+01 9.594e+01 1.042e+02 1.593e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-25 21:37:19,746 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3073220.0, ans=0.0 2023-11-25 21:37:21,903 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3073220.0, ans=0.125 2023-11-25 21:37:31,769 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3073286.6666666665, ans=0.125 2023-11-25 21:37:35,836 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 461000 2023-11-25 21:37:38,723 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.95 vs. limit=22.5 2023-11-25 21:37:41,379 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 4100, loss[loss=0.09035, simple_loss=0.1329, pruned_loss=0.01451, audio_tagging_loss=0.009393, over 16187.00 frames. ], tot_loss[loss=0.06752, simple_loss=0.09068, pruned_loss=0.01282, audio_tagging_loss=0.009363, over 3046055.67 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 21:37:48,945 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3073353.3333333335, ans=0.0 2023-11-25 21:37:55,091 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3073420.0, ans=0.125 2023-11-25 21:37:57,259 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3073420.0, ans=0.125 2023-11-25 21:37:58,563 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.98 vs. limit=15.0 2023-11-25 21:38:26,680 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3073620.0, ans=0.125 2023-11-25 21:38:30,694 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 461050 2023-11-25 21:38:36,934 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 4150, loss[loss=0.07155, simple_loss=0.08865, pruned_loss=0.01642, audio_tagging_loss=0.0108, over 16133.00 frames. ], tot_loss[loss=0.06779, simple_loss=0.0912, pruned_loss=0.01293, audio_tagging_loss=0.009262, over 3054021.47 frames. ], batch size: 60, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 21:38:40,206 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3073686.6666666665, ans=0.125 2023-11-25 21:38:59,071 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3073820.0, ans=0.125 2023-11-25 21:39:01,187 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3073820.0, ans=0.125 2023-11-25 21:39:02,365 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3073820.0, ans=0.125 2023-11-25 21:39:03,354 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3073820.0, ans=0.125 2023-11-25 21:39:09,593 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.340e+01 8.760e+01 9.274e+01 9.766e+01 1.268e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-25 21:39:14,021 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3073886.6666666665, ans=0.0 2023-11-25 21:39:17,902 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 21:39:26,819 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 461100 2023-11-25 21:39:32,005 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 4200, loss[loss=0.07257, simple_loss=0.1051, pruned_loss=0.01195, audio_tagging_loss=0.008066, over 15660.00 frames. ], tot_loss[loss=0.06745, simple_loss=0.09107, pruned_loss=0.01278, audio_tagging_loss=0.009136, over 3055060.05 frames. ], batch size: 60, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 21:39:34,614 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.74 vs. limit=15.0 2023-11-25 21:39:44,485 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3074086.6666666665, ans=0.125 2023-11-25 21:40:03,202 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3074153.3333333335, ans=0.125 2023-11-25 21:40:05,928 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3074220.0, ans=0.125 2023-11-25 21:40:09,110 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3074220.0, ans=0.0 2023-11-25 21:40:10,190 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3074220.0, ans=0.0 2023-11-25 21:40:14,282 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 21:40:21,678 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 461150 2023-11-25 21:40:22,719 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.87 vs. limit=15.0 2023-11-25 21:40:27,302 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 4250, loss[loss=0.077, simple_loss=0.1045, pruned_loss=0.01705, audio_tagging_loss=0.007721, over 15069.00 frames. ], tot_loss[loss=0.06713, simple_loss=0.09073, pruned_loss=0.01266, audio_tagging_loss=0.009106, over 3052213.38 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 21:40:29,590 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3074353.3333333335, ans=0.125 2023-11-25 21:41:00,147 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.068e+01 8.649e+01 9.520e+01 1.007e+02 1.325e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-25 21:41:00,457 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3074553.3333333335, ans=0.0 2023-11-25 21:41:08,168 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3074553.3333333335, ans=0.125 2023-11-25 21:41:15,532 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3074620.0, ans=0.125 2023-11-25 21:41:16,496 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 461200 2023-11-25 21:41:22,524 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 4300, loss[loss=0.06905, simple_loss=0.09003, pruned_loss=0.01429, audio_tagging_loss=0.009748, over 14412.00 frames. ], tot_loss[loss=0.06748, simple_loss=0.09119, pruned_loss=0.01288, audio_tagging_loss=0.009003, over 3053539.60 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 21:41:33,229 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3074753.3333333335, ans=0.0 2023-11-25 21:41:38,582 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3074753.3333333335, ans=0.125 2023-11-25 21:42:00,061 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3074886.6666666665, ans=0.05 2023-11-25 21:42:09,750 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.59 vs. limit=15.0 2023-11-25 21:42:13,036 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 461250 2023-11-25 21:42:16,337 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3074953.3333333335, ans=0.125 2023-11-25 21:42:18,148 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 4350, loss[loss=0.05914, simple_loss=0.0709, pruned_loss=0.01269, audio_tagging_loss=0.011, over 15696.00 frames. ], tot_loss[loss=0.06797, simple_loss=0.09178, pruned_loss=0.01305, audio_tagging_loss=0.009024, over 3056991.45 frames. ], batch size: 60, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 21:42:50,826 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.63 vs. limit=15.0 2023-11-25 21:42:51,203 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.343e+01 8.819e+01 9.341e+01 1.009e+02 3.956e+02, threshold=1.868e+02, percent-clipped=1.0 2023-11-25 21:42:51,298 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3075220.0, ans=0.015 2023-11-25 21:42:55,104 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3075220.0, ans=0.0 2023-11-25 21:43:07,466 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 461300 2023-11-25 21:43:12,644 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 4400, loss[loss=0.06974, simple_loss=0.091, pruned_loss=0.01362, audio_tagging_loss=0.01063, over 15679.00 frames. ], tot_loss[loss=0.06758, simple_loss=0.09117, pruned_loss=0.01289, audio_tagging_loss=0.009102, over 3062445.94 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:43:18,537 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3075353.3333333335, ans=0.0 2023-11-25 21:43:42,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3075486.6666666665, ans=0.025 2023-11-25 21:43:50,406 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-25 21:43:53,586 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3075553.3333333335, ans=0.0 2023-11-25 21:43:55,963 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.61 vs. limit=10.0 2023-11-25 21:44:01,349 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3075620.0, ans=0.125 2023-11-25 21:44:02,317 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 461350 2023-11-25 21:44:07,977 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 4450, loss[loss=0.07099, simple_loss=0.09562, pruned_loss=0.01347, audio_tagging_loss=0.00971, over 14381.00 frames. ], tot_loss[loss=0.06714, simple_loss=0.09063, pruned_loss=0.01274, audio_tagging_loss=0.00908, over 3059480.17 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:44:19,677 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3075753.3333333335, ans=0.0 2023-11-25 21:44:41,811 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.256e+01 8.665e+01 9.390e+01 1.023e+02 1.193e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-25 21:44:49,911 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3075886.6666666665, ans=0.0 2023-11-25 21:44:57,525 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 461400 2023-11-25 21:45:00,503 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3075953.3333333335, ans=0.2 2023-11-25 21:45:03,432 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 4500, loss[loss=0.07644, simple_loss=0.1022, pruned_loss=0.0172, audio_tagging_loss=0.008141, over 14856.00 frames. ], tot_loss[loss=0.06685, simple_loss=0.09028, pruned_loss=0.01262, audio_tagging_loss=0.009092, over 3055966.38 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 21:45:22,321 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3076086.6666666665, ans=0.125 2023-11-25 21:45:26,952 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3076153.3333333335, ans=0.0 2023-11-25 21:45:52,282 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 461450 2023-11-25 21:45:57,461 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 4550, loss[loss=0.04666, simple_loss=0.05955, pruned_loss=0.006687, audio_tagging_loss=0.0102, over 15271.00 frames. ], tot_loss[loss=0.06709, simple_loss=0.09027, pruned_loss=0.01282, audio_tagging_loss=0.009141, over 3053059.10 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 21:46:00,102 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.45 vs. limit=12.0 2023-11-25 21:46:02,111 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.64 vs. limit=15.0 2023-11-25 21:46:21,108 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3076486.6666666665, ans=0.0 2023-11-25 21:46:23,638 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3076486.6666666665, ans=0.125 2023-11-25 21:46:28,720 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3076486.6666666665, ans=0.1 2023-11-25 21:46:31,606 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.239e+01 8.489e+01 8.972e+01 9.819e+01 1.195e+02, threshold=1.794e+02, percent-clipped=0.0 2023-11-25 21:46:40,029 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 21:46:45,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3076620.0, ans=0.125 2023-11-25 21:46:46,248 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 461500 2023-11-25 21:46:47,425 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3076620.0, ans=0.125 2023-11-25 21:46:48,476 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3076620.0, ans=0.0 2023-11-25 21:46:48,715 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.71 vs. limit=22.5 2023-11-25 21:46:51,810 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 4600, loss[loss=0.04135, simple_loss=0.05199, pruned_loss=0.004942, audio_tagging_loss=0.01041, over 14255.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.08924, pruned_loss=0.01265, audio_tagging_loss=0.009213, over 3052819.38 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 21:46:57,064 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.42 vs. limit=12.0 2023-11-25 21:47:41,698 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 461550 2023-11-25 21:47:47,364 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 4650, loss[loss=0.09119, simple_loss=0.13, pruned_loss=0.01757, audio_tagging_loss=0.008627, over 15842.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.08934, pruned_loss=0.01262, audio_tagging_loss=0.009228, over 3054813.35 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 21:48:01,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3077086.6666666665, ans=0.0 2023-11-25 21:48:20,604 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.306e+01 8.563e+01 9.172e+01 1.006e+02 1.160e+02, threshold=1.834e+02, percent-clipped=0.0 2023-11-25 21:48:36,258 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 461600 2023-11-25 21:48:41,748 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 4700, loss[loss=0.07814, simple_loss=0.1093, pruned_loss=0.01669, audio_tagging_loss=0.006814, over 14771.00 frames. ], tot_loss[loss=0.06728, simple_loss=0.0906, pruned_loss=0.01279, audio_tagging_loss=0.009193, over 3053052.13 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 21:48:56,346 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3077420.0, ans=0.2 2023-11-25 21:48:56,532 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.89 vs. limit=15.0 2023-11-25 21:49:06,761 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3077486.6666666665, ans=0.0 2023-11-25 21:49:13,474 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3077553.3333333335, ans=0.07 2023-11-25 21:49:21,753 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3077553.3333333335, ans=0.1 2023-11-25 21:49:29,922 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 461650 2023-11-25 21:49:35,076 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 4750, loss[loss=0.04875, simple_loss=0.06231, pruned_loss=0.007856, audio_tagging_loss=0.009741, over 16069.00 frames. ], tot_loss[loss=0.06696, simple_loss=0.08987, pruned_loss=0.01269, audio_tagging_loss=0.009331, over 3043636.03 frames. ], batch size: 63, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 21:49:55,013 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3077753.3333333335, ans=0.0 2023-11-25 21:49:58,440 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.79 vs. limit=15.0 2023-11-25 21:50:03,492 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.98 vs. limit=22.5 2023-11-25 21:50:05,444 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3077820.0, ans=0.2 2023-11-25 21:50:09,278 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.330e+01 8.924e+01 9.307e+01 1.025e+02 1.203e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-25 21:50:24,995 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 461700 2023-11-25 21:50:25,173 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3077953.3333333335, ans=0.0 2023-11-25 21:50:30,548 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 4800, loss[loss=0.06828, simple_loss=0.08879, pruned_loss=0.01477, audio_tagging_loss=0.009107, over 16539.00 frames. ], tot_loss[loss=0.06727, simple_loss=0.09018, pruned_loss=0.01277, audio_tagging_loss=0.009406, over 3052649.07 frames. ], batch size: 62, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:50:49,157 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.37 vs. limit=22.5 2023-11-25 21:50:51,471 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.27 vs. limit=12.0 2023-11-25 21:51:02,233 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3078220.0, ans=0.125 2023-11-25 21:51:06,525 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3078220.0, ans=0.0 2023-11-25 21:51:11,921 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3078220.0, ans=0.1 2023-11-25 21:51:18,558 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3078286.6666666665, ans=0.1 2023-11-25 21:51:19,477 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 461750 2023-11-25 21:51:24,560 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 4850, loss[loss=0.06219, simple_loss=0.08255, pruned_loss=0.009028, audio_tagging_loss=0.01189, over 15221.00 frames. ], tot_loss[loss=0.06727, simple_loss=0.0903, pruned_loss=0.01261, audio_tagging_loss=0.00951, over 3053495.80 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:51:40,297 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3078420.0, ans=0.0 2023-11-25 21:51:42,349 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3078420.0, ans=0.125 2023-11-25 21:51:58,151 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.385e+01 8.740e+01 9.474e+01 1.031e+02 1.193e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-25 21:52:12,719 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 461800 2023-11-25 21:52:14,291 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3078620.0, ans=0.125 2023-11-25 21:52:18,110 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 4900, loss[loss=0.06394, simple_loss=0.08379, pruned_loss=0.01342, audio_tagging_loss=0.008622, over 14273.00 frames. ], tot_loss[loss=0.06706, simple_loss=0.09011, pruned_loss=0.01256, audio_tagging_loss=0.009443, over 3051975.68 frames. ], batch size: 53, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:52:21,404 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3078686.6666666665, ans=0.0 2023-11-25 21:53:04,914 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=3078953.3333333335, ans=0.5 2023-11-25 21:53:05,935 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3078953.3333333335, ans=0.1 2023-11-25 21:53:07,340 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 461850 2023-11-25 21:53:12,990 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 4950, loss[loss=0.04799, simple_loss=0.06572, pruned_loss=0.006094, audio_tagging_loss=0.009035, over 14155.00 frames. ], tot_loss[loss=0.06695, simple_loss=0.09045, pruned_loss=0.01246, audio_tagging_loss=0.009268, over 3046637.59 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:53:27,751 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3079086.6666666665, ans=0.125 2023-11-25 21:53:44,249 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3079220.0, ans=0.125 2023-11-25 21:53:44,346 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3079220.0, ans=0.125 2023-11-25 21:53:46,071 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.137e+01 8.693e+01 9.293e+01 9.943e+01 1.246e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-25 21:54:02,755 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 461900 2023-11-25 21:54:07,911 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 5000, loss[loss=0.05557, simple_loss=0.06996, pruned_loss=0.009319, audio_tagging_loss=0.01127, over 14719.00 frames. ], tot_loss[loss=0.06696, simple_loss=0.09058, pruned_loss=0.01254, audio_tagging_loss=0.009124, over 3048788.29 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:54:09,195 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3079353.3333333335, ans=0.2 2023-11-25 21:54:12,318 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3079353.3333333335, ans=0.2 2023-11-25 21:54:15,332 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3079353.3333333335, ans=0.0 2023-11-25 21:54:18,384 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3079420.0, ans=0.125 2023-11-25 21:54:45,281 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3079553.3333333335, ans=0.125 2023-11-25 21:54:47,326 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3079553.3333333335, ans=0.0 2023-11-25 21:54:51,430 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3079620.0, ans=0.0 2023-11-25 21:54:52,564 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3079620.0, ans=0.1 2023-11-25 21:54:56,531 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 461950 2023-11-25 21:55:01,820 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 5050, loss[loss=0.05232, simple_loss=0.07228, pruned_loss=0.01029, audio_tagging_loss=0.00589, over 14796.00 frames. ], tot_loss[loss=0.06718, simple_loss=0.09101, pruned_loss=0.01267, audio_tagging_loss=0.008998, over 3041769.24 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:55:22,973 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3079820.0, ans=0.1 2023-11-25 21:55:32,815 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3079820.0, ans=0.125 2023-11-25 21:55:34,990 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3079886.6666666665, ans=0.125 2023-11-25 21:55:35,798 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.630e+01 8.539e+01 9.066e+01 9.676e+01 1.144e+02, threshold=1.813e+02, percent-clipped=0.0 2023-11-25 21:55:50,304 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 462000 2023-11-25 21:55:56,012 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 5100, loss[loss=0.05814, simple_loss=0.08042, pruned_loss=0.006784, audio_tagging_loss=0.01115, over 15681.00 frames. ], tot_loss[loss=0.06668, simple_loss=0.0902, pruned_loss=0.01252, audio_tagging_loss=0.009054, over 3047426.01 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:55:56,180 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3080020.0, ans=0.125 2023-11-25 21:56:03,372 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3080020.0, ans=0.125 2023-11-25 21:56:13,369 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3080086.6666666665, ans=0.125 2023-11-25 21:56:15,342 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3080086.6666666665, ans=0.125 2023-11-25 21:56:19,774 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.90 vs. limit=15.0 2023-11-25 21:56:22,545 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 21:56:45,796 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 462050 2023-11-25 21:56:51,454 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 5150, loss[loss=0.06042, simple_loss=0.07997, pruned_loss=0.009073, audio_tagging_loss=0.01136, over 15385.00 frames. ], tot_loss[loss=0.06691, simple_loss=0.09052, pruned_loss=0.01259, audio_tagging_loss=0.009065, over 3043350.30 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:56:59,308 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.75 vs. limit=12.0 2023-11-25 21:57:03,512 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.19 vs. limit=10.0 2023-11-25 21:57:10,297 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3080420.0, ans=0.0 2023-11-25 21:57:22,648 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3080553.3333333335, ans=0.125 2023-11-25 21:57:25,137 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.840e+01 8.763e+01 9.349e+01 9.902e+01 1.210e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-25 21:57:25,893 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.17 vs. limit=6.0 2023-11-25 21:57:40,148 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 462100 2023-11-25 21:57:45,284 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 5200, loss[loss=0.04942, simple_loss=0.06882, pruned_loss=0.00706, audio_tagging_loss=0.007953, over 15850.00 frames. ], tot_loss[loss=0.06755, simple_loss=0.09157, pruned_loss=0.01275, audio_tagging_loss=0.009013, over 3048786.20 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:58:05,377 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3080753.3333333335, ans=0.2 2023-11-25 21:58:33,400 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.98 vs. limit=12.0 2023-11-25 21:58:33,977 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 462150 2023-11-25 21:58:34,153 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3080953.3333333335, ans=0.125 2023-11-25 21:58:38,946 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3081020.0, ans=0.0 2023-11-25 21:58:39,642 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 5250, loss[loss=0.05294, simple_loss=0.06431, pruned_loss=0.009389, audio_tagging_loss=0.0114, over 15034.00 frames. ], tot_loss[loss=0.06786, simple_loss=0.09194, pruned_loss=0.01289, audio_tagging_loss=0.008995, over 3042952.67 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:59:14,198 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.326e+01 8.546e+01 9.251e+01 9.912e+01 1.159e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-25 21:59:29,358 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 462200 2023-11-25 21:59:30,977 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.77 vs. limit=22.5 2023-11-25 21:59:34,794 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 5300, loss[loss=0.07649, simple_loss=0.1005, pruned_loss=0.01906, audio_tagging_loss=0.007167, over 14892.00 frames. ], tot_loss[loss=0.06807, simple_loss=0.09226, pruned_loss=0.01299, audio_tagging_loss=0.008949, over 3041735.17 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:59:35,045 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3081353.3333333335, ans=0.0 2023-11-25 21:59:59,621 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.52 vs. limit=12.0 2023-11-25 22:00:00,519 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3081486.6666666665, ans=0.125 2023-11-25 22:00:03,942 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3081486.6666666665, ans=0.015 2023-11-25 22:00:13,442 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3081553.3333333335, ans=0.2 2023-11-25 22:00:23,665 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 462250 2023-11-25 22:00:29,310 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 5350, loss[loss=0.06693, simple_loss=0.08668, pruned_loss=0.01079, audio_tagging_loss=0.0128, over 15329.00 frames. ], tot_loss[loss=0.06854, simple_loss=0.09283, pruned_loss=0.01317, audio_tagging_loss=0.008955, over 3047473.57 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 22:00:35,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3081686.6666666665, ans=0.2 2023-11-25 22:00:54,944 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3081820.0, ans=0.07 2023-11-25 22:00:55,860 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3081820.0, ans=0.125 2023-11-25 22:01:04,028 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.414e+01 8.802e+01 9.245e+01 1.006e+02 1.324e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-25 22:01:06,355 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3081886.6666666665, ans=0.125 2023-11-25 22:01:18,112 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 462300 2023-11-25 22:01:20,473 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3081953.3333333335, ans=0.2 2023-11-25 22:01:23,233 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 5400, loss[loss=0.07104, simple_loss=0.09114, pruned_loss=0.01618, audio_tagging_loss=0.009292, over 14860.00 frames. ], tot_loss[loss=0.0692, simple_loss=0.09376, pruned_loss=0.01339, audio_tagging_loss=0.00893, over 3047759.19 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 22:01:32,694 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3082020.0, ans=0.0 2023-11-25 22:01:40,627 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3082086.6666666665, ans=0.0 2023-11-25 22:01:43,281 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.13 vs. limit=6.0 2023-11-25 22:02:01,991 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3082220.0, ans=0.1 2023-11-25 22:02:02,983 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3082220.0, ans=0.2 2023-11-25 22:02:13,181 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 462350 2023-11-25 22:02:18,824 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 5450, loss[loss=0.05396, simple_loss=0.06425, pruned_loss=0.00868, audio_tagging_loss=0.01316, over 15155.00 frames. ], tot_loss[loss=0.06848, simple_loss=0.09264, pruned_loss=0.01312, audio_tagging_loss=0.009044, over 3047110.04 frames. ], batch size: 60, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 22:02:46,496 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3082486.6666666665, ans=0.125 2023-11-25 22:02:46,614 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3082486.6666666665, ans=0.125 2023-11-25 22:02:53,084 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.422e+01 8.739e+01 9.442e+01 1.018e+02 1.459e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-25 22:03:00,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3082553.3333333335, ans=0.125 2023-11-25 22:03:07,611 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 462400 2023-11-25 22:03:12,951 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 5500, loss[loss=0.06493, simple_loss=0.09412, pruned_loss=0.01167, audio_tagging_loss=0.006201, over 14911.00 frames. ], tot_loss[loss=0.06829, simple_loss=0.09217, pruned_loss=0.01313, audio_tagging_loss=0.009075, over 3048090.17 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 22:03:15,244 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3082686.6666666665, ans=0.1 2023-11-25 22:03:27,111 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3082753.3333333335, ans=0.125 2023-11-25 22:04:02,251 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 462450 2023-11-25 22:04:02,410 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3082953.3333333335, ans=0.125 2023-11-25 22:04:04,677 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.25 vs. limit=15.0 2023-11-25 22:04:07,428 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 5550, loss[loss=0.09262, simple_loss=0.1152, pruned_loss=0.02434, audio_tagging_loss=0.01068, over 14802.00 frames. ], tot_loss[loss=0.06813, simple_loss=0.09161, pruned_loss=0.01312, audio_tagging_loss=0.009209, over 3046838.72 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 22:04:22,666 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3083086.6666666665, ans=0.125 2023-11-25 22:04:42,726 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.227e+01 8.648e+01 9.293e+01 9.970e+01 1.288e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-25 22:04:46,107 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3083220.0, ans=0.2 2023-11-25 22:04:56,913 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 462500 2023-11-25 22:05:03,027 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 5600, loss[loss=0.06554, simple_loss=0.09644, pruned_loss=0.0104, audio_tagging_loss=0.006923, over 14544.00 frames. ], tot_loss[loss=0.06809, simple_loss=0.09166, pruned_loss=0.01301, audio_tagging_loss=0.00925, over 3039472.50 frames. ], batch size: 52, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 22:05:29,727 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=3083486.6666666665, ans=0.05 2023-11-25 22:05:43,052 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 22:05:51,818 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 462550 2023-11-25 22:05:56,946 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 5650, loss[loss=0.06773, simple_loss=0.08759, pruned_loss=0.01372, audio_tagging_loss=0.01022, over 15081.00 frames. ], tot_loss[loss=0.06787, simple_loss=0.09127, pruned_loss=0.01298, audio_tagging_loss=0.009259, over 3045818.58 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 22:06:01,482 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-25 22:06:18,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3083820.0, ans=0.1 2023-11-25 22:06:23,678 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3083820.0, ans=0.125 2023-11-25 22:06:23,966 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.82 vs. limit=15.0 2023-11-25 22:06:31,734 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.47 vs. limit=10.0 2023-11-25 22:06:32,138 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.579e+01 8.554e+01 9.201e+01 9.858e+01 1.570e+02, threshold=1.840e+02, percent-clipped=0.0 2023-11-25 22:06:45,784 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 462600 2023-11-25 22:06:51,779 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 5700, loss[loss=0.07031, simple_loss=0.09252, pruned_loss=0.01365, audio_tagging_loss=0.0104, over 15142.00 frames. ], tot_loss[loss=0.0668, simple_loss=0.08932, pruned_loss=0.01273, audio_tagging_loss=0.009412, over 3040611.35 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 22:07:08,181 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3084086.6666666665, ans=0.125 2023-11-25 22:07:09,281 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3084086.6666666665, ans=0.125 2023-11-25 22:07:15,497 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3084153.3333333335, ans=0.125 2023-11-25 22:07:18,895 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.78 vs. limit=15.0 2023-11-25 22:07:32,020 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-25 22:07:41,336 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 462650 2023-11-25 22:07:46,930 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 5750, loss[loss=0.05534, simple_loss=0.07725, pruned_loss=0.008266, audio_tagging_loss=0.008443, over 15536.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.08838, pruned_loss=0.01265, audio_tagging_loss=0.00936, over 3043912.13 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 22:08:03,705 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.02 vs. limit=15.0 2023-11-25 22:08:06,496 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3084420.0, ans=0.125 2023-11-25 22:08:21,774 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.113e+01 8.632e+01 9.114e+01 9.911e+01 1.968e+02, threshold=1.823e+02, percent-clipped=1.0 2023-11-25 22:08:28,676 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3084553.3333333335, ans=0.125 2023-11-25 22:08:36,347 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 462700 2023-11-25 22:08:41,433 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 5800, loss[loss=0.08073, simple_loss=0.1108, pruned_loss=0.015, audio_tagging_loss=0.01035, over 14310.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.08855, pruned_loss=0.01268, audio_tagging_loss=0.009266, over 3039488.71 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:08:42,709 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3084686.6666666665, ans=0.1 2023-11-25 22:08:42,748 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3084686.6666666665, ans=0.125 2023-11-25 22:08:51,990 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3084753.3333333335, ans=0.1 2023-11-25 22:09:04,384 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3084820.0, ans=0.125 2023-11-25 22:09:30,153 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 462750 2023-11-25 22:09:35,304 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 5850, loss[loss=0.05878, simple_loss=0.06929, pruned_loss=0.01194, audio_tagging_loss=0.01219, over 15687.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.08918, pruned_loss=0.01264, audio_tagging_loss=0.009196, over 3040092.77 frames. ], batch size: 60, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:09:53,309 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3085086.6666666665, ans=0.1 2023-11-25 22:09:58,061 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.79 vs. limit=15.0 2023-11-25 22:10:12,171 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.279e+01 8.550e+01 9.214e+01 9.901e+01 1.645e+02, threshold=1.843e+02, percent-clipped=0.0 2023-11-25 22:10:12,503 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3085220.0, ans=0.2 2023-11-25 22:10:15,605 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3085220.0, ans=0.125 2023-11-25 22:10:24,287 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 462800 2023-11-25 22:10:30,185 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 5900, loss[loss=0.1128, simple_loss=0.1629, pruned_loss=0.02746, audio_tagging_loss=0.00389, over 15918.00 frames. ], tot_loss[loss=0.06682, simple_loss=0.09005, pruned_loss=0.0127, audio_tagging_loss=0.009096, over 3044324.39 frames. ], batch size: 54, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:10:30,395 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3085353.3333333335, ans=0.125 2023-11-25 22:10:34,010 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3085353.3333333335, ans=0.1 2023-11-25 22:10:37,613 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3085353.3333333335, ans=0.125 2023-11-25 22:10:42,826 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3085420.0, ans=0.1 2023-11-25 22:10:43,047 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.32 vs. limit=15.0 2023-11-25 22:10:50,243 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.33 vs. limit=15.0 2023-11-25 22:11:02,460 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3085553.3333333335, ans=0.2 2023-11-25 22:11:19,902 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 462850 2023-11-25 22:11:24,928 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 5950, loss[loss=0.07132, simple_loss=0.1001, pruned_loss=0.01182, audio_tagging_loss=0.009436, over 15069.00 frames. ], tot_loss[loss=0.0668, simple_loss=0.08987, pruned_loss=0.01277, audio_tagging_loss=0.009095, over 3047518.21 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:11:50,151 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.39 vs. limit=15.0 2023-11-25 22:11:54,285 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3085820.0, ans=0.2 2023-11-25 22:12:01,716 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3085886.6666666665, ans=0.125 2023-11-25 22:12:02,508 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.382e+01 8.667e+01 9.136e+01 9.803e+01 1.331e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-25 22:12:02,699 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3085886.6666666665, ans=0.125 2023-11-25 22:12:14,026 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 462900 2023-11-25 22:12:19,208 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 6000, loss[loss=0.06993, simple_loss=0.09331, pruned_loss=0.01357, audio_tagging_loss=0.009699, over 14216.00 frames. ], tot_loss[loss=0.06693, simple_loss=0.09028, pruned_loss=0.01272, audio_tagging_loss=0.009062, over 3043144.61 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:12:19,211 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-25 22:12:35,393 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.0704, 3.8452, 3.3651, 3.6962], device='cuda:0') 2023-11-25 22:12:46,615 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.9360, 1.5243, 3.4602, 3.0487, 2.8140, 3.1764, 3.1133, 3.2235], device='cuda:0') 2023-11-25 22:12:50,936 INFO [train_asr.py:1267] (0/4) Epoch 39, validation: loss=0.05816, simple_loss=0.05073, pruned_loss=0.00518, audio_tagging_loss=0.02762, over 4681554.00 frames. 2023-11-25 22:12:50,936 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-25 22:12:51,252 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3086020.0, ans=0.2 2023-11-25 22:13:02,637 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.47 vs. limit=12.0 2023-11-25 22:13:25,698 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3086220.0, ans=0.0 2023-11-25 22:13:31,823 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 22:13:40,646 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 462950 2023-11-25 22:13:43,916 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3086286.6666666665, ans=0.125 2023-11-25 22:13:45,809 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 6050, loss[loss=0.05537, simple_loss=0.07322, pruned_loss=0.007702, audio_tagging_loss=0.01106, over 15649.00 frames. ], tot_loss[loss=0.0666, simple_loss=0.09012, pruned_loss=0.01254, audio_tagging_loss=0.008996, over 3046617.53 frames. ], batch size: 60, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:14:19,488 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3086553.3333333335, ans=0.05 2023-11-25 22:14:23,447 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.667e+01 8.707e+01 9.356e+01 1.011e+02 1.518e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-25 22:14:30,007 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3086620.0, ans=0.1 2023-11-25 22:14:33,270 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3086620.0, ans=0.0 2023-11-25 22:14:35,147 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 463000 2023-11-25 22:14:37,585 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3086620.0, ans=0.125 2023-11-25 22:14:38,714 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3086620.0, ans=0.07 2023-11-25 22:14:40,564 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 6100, loss[loss=0.07548, simple_loss=0.08458, pruned_loss=0.02205, audio_tagging_loss=0.01114, over 14275.00 frames. ], tot_loss[loss=0.0672, simple_loss=0.09125, pruned_loss=0.01267, audio_tagging_loss=0.008906, over 3045524.51 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:14:56,128 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.07 vs. limit=15.0 2023-11-25 22:15:09,276 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3086820.0, ans=0.125 2023-11-25 22:15:16,758 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.10 vs. limit=15.0 2023-11-25 22:15:24,854 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3086953.3333333335, ans=0.125 2023-11-25 22:15:29,065 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3086953.3333333335, ans=0.2 2023-11-25 22:15:29,949 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 463050 2023-11-25 22:15:36,143 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 6150, loss[loss=0.05927, simple_loss=0.07365, pruned_loss=0.01144, audio_tagging_loss=0.01101, over 15153.00 frames. ], tot_loss[loss=0.06683, simple_loss=0.09042, pruned_loss=0.01263, audio_tagging_loss=0.008991, over 3049886.45 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:15:41,924 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.39 vs. limit=15.0 2023-11-25 22:15:48,836 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3087086.6666666665, ans=0.125 2023-11-25 22:15:57,164 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3087153.3333333335, ans=0.2 2023-11-25 22:16:01,423 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3087153.3333333335, ans=0.125 2023-11-25 22:16:04,776 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.22 vs. limit=22.5 2023-11-25 22:16:09,724 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3087220.0, ans=0.1 2023-11-25 22:16:14,305 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.737e+01 8.733e+01 9.242e+01 9.873e+01 1.239e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-25 22:16:20,278 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3087286.6666666665, ans=0.125 2023-11-25 22:16:26,395 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 463100 2023-11-25 22:16:28,735 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3087286.6666666665, ans=0.125 2023-11-25 22:16:31,579 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 6200, loss[loss=0.0607, simple_loss=0.08449, pruned_loss=0.01084, audio_tagging_loss=0.007623, over 16484.00 frames. ], tot_loss[loss=0.06644, simple_loss=0.08992, pruned_loss=0.01242, audio_tagging_loss=0.009062, over 3051296.55 frames. ], batch size: 64, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:16:38,225 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3087353.3333333335, ans=0.125 2023-11-25 22:17:01,129 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3087486.6666666665, ans=0.1 2023-11-25 22:17:17,890 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.30 vs. limit=15.0 2023-11-25 22:17:20,729 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 463150 2023-11-25 22:17:24,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3087686.6666666665, ans=0.1 2023-11-25 22:17:25,794 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 6250, loss[loss=0.06485, simple_loss=0.082, pruned_loss=0.01258, audio_tagging_loss=0.01127, over 15829.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.08948, pruned_loss=0.01238, audio_tagging_loss=0.009161, over 3050252.74 frames. ], batch size: 60, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:17:31,742 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.04 vs. limit=15.0 2023-11-25 22:17:39,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3087753.3333333335, ans=0.5 2023-11-25 22:18:04,819 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.384e+01 8.687e+01 9.120e+01 9.665e+01 2.497e+02, threshold=1.824e+02, percent-clipped=1.0 2023-11-25 22:18:08,780 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.69 vs. limit=8.0 2023-11-25 22:18:15,269 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 463200 2023-11-25 22:18:21,356 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 6300, loss[loss=0.07442, simple_loss=0.09502, pruned_loss=0.01746, audio_tagging_loss=0.009457, over 15143.00 frames. ], tot_loss[loss=0.06662, simple_loss=0.08947, pruned_loss=0.01255, audio_tagging_loss=0.009336, over 3049048.40 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:18:37,770 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-25 22:18:41,985 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3088086.6666666665, ans=0.5 2023-11-25 22:18:44,043 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3088153.3333333335, ans=0.125 2023-11-25 22:18:45,217 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3088153.3333333335, ans=0.125 2023-11-25 22:19:06,521 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.08 vs. limit=22.5 2023-11-25 22:19:08,189 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3088286.6666666665, ans=0.0 2023-11-25 22:19:08,190 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3088286.6666666665, ans=0.125 2023-11-25 22:19:10,889 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-25 22:19:11,729 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 463250 2023-11-25 22:19:16,998 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 6350, loss[loss=0.06995, simple_loss=0.1016, pruned_loss=0.01117, audio_tagging_loss=0.008011, over 15694.00 frames. ], tot_loss[loss=0.06699, simple_loss=0.08998, pruned_loss=0.01262, audio_tagging_loss=0.009375, over 3048771.30 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:19:17,831 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3088353.3333333335, ans=0.1 2023-11-25 22:19:18,801 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3088353.3333333335, ans=0.1 2023-11-25 22:19:22,061 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3088353.3333333335, ans=0.0 2023-11-25 22:19:36,861 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3088420.0, ans=0.125 2023-11-25 22:19:41,084 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3088486.6666666665, ans=0.1 2023-11-25 22:19:54,403 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3088553.3333333335, ans=0.2 2023-11-25 22:19:56,125 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.178e+01 8.465e+01 9.257e+01 9.775e+01 1.191e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-25 22:20:06,248 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3088620.0, ans=0.125 2023-11-25 22:20:07,189 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 463300 2023-11-25 22:20:10,735 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.78 vs. limit=10.0 2023-11-25 22:20:11,607 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-25 22:20:12,109 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.64 vs. limit=15.0 2023-11-25 22:20:12,516 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 6400, loss[loss=0.06914, simple_loss=0.08927, pruned_loss=0.01525, audio_tagging_loss=0.009256, over 14687.00 frames. ], tot_loss[loss=0.06709, simple_loss=0.08992, pruned_loss=0.01272, audio_tagging_loss=0.00941, over 3042830.52 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:20:35,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3088820.0, ans=0.125 2023-11-25 22:20:44,370 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3088886.6666666665, ans=0.125 2023-11-25 22:20:48,961 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3088886.6666666665, ans=0.125 2023-11-25 22:20:51,173 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3088886.6666666665, ans=0.09899494936611666 2023-11-25 22:20:54,310 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3088886.6666666665, ans=0.125 2023-11-25 22:20:59,791 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.18 vs. limit=22.5 2023-11-25 22:21:01,536 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 463350 2023-11-25 22:21:02,851 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3088953.3333333335, ans=0.1 2023-11-25 22:21:06,708 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 6450, loss[loss=0.06867, simple_loss=0.08944, pruned_loss=0.01346, audio_tagging_loss=0.0105, over 15968.00 frames. ], tot_loss[loss=0.06759, simple_loss=0.09058, pruned_loss=0.0128, audio_tagging_loss=0.0095, over 3040226.93 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:21:39,537 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3089220.0, ans=0.0 2023-11-25 22:21:43,662 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3089220.0, ans=0.125 2023-11-25 22:21:45,550 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.108e+01 8.513e+01 9.181e+01 1.004e+02 1.135e+02, threshold=1.836e+02, percent-clipped=0.0 2023-11-25 22:21:53,922 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3089286.6666666665, ans=0.125 2023-11-25 22:21:57,001 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 463400 2023-11-25 22:22:03,083 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 6500, loss[loss=0.0683, simple_loss=0.09868, pruned_loss=0.01155, audio_tagging_loss=0.007409, over 15123.00 frames. ], tot_loss[loss=0.06716, simple_loss=0.09005, pruned_loss=0.01263, audio_tagging_loss=0.009509, over 3045521.26 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:22:12,069 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3089353.3333333335, ans=0.035 2023-11-25 22:22:21,578 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3089420.0, ans=0.2 2023-11-25 22:22:22,363 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3089420.0, ans=0.125 2023-11-25 22:22:27,813 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3089486.6666666665, ans=0.0 2023-11-25 22:22:34,676 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3089553.3333333335, ans=0.125 2023-11-25 22:22:34,680 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3089553.3333333335, ans=0.125 2023-11-25 22:22:37,197 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3089553.3333333335, ans=0.0 2023-11-25 22:22:47,818 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn1.whiten.whitening_limit, batch_count=3089620.0, ans=22.5 2023-11-25 22:22:50,611 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.02 vs. limit=22.5 2023-11-25 22:22:51,457 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3089620.0, ans=0.07 2023-11-25 22:22:52,364 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 463450 2023-11-25 22:22:58,095 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 6550, loss[loss=0.06676, simple_loss=0.09647, pruned_loss=0.009131, audio_tagging_loss=0.009396, over 14946.00 frames. ], tot_loss[loss=0.06765, simple_loss=0.09087, pruned_loss=0.01284, audio_tagging_loss=0.009382, over 3050751.65 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:23:15,706 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3089753.3333333335, ans=0.0 2023-11-25 22:23:35,751 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3089886.6666666665, ans=0.125 2023-11-25 22:23:36,508 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.082e+01 8.654e+01 9.097e+01 9.635e+01 1.212e+02, threshold=1.819e+02, percent-clipped=0.0 2023-11-25 22:23:37,046 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.21 vs. limit=15.0 2023-11-25 22:23:39,561 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.44 vs. limit=15.0 2023-11-25 22:23:47,525 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 463500 2023-11-25 22:23:47,727 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3089953.3333333335, ans=0.07 2023-11-25 22:23:52,786 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 6600, loss[loss=0.05314, simple_loss=0.066, pruned_loss=0.01208, audio_tagging_loss=0.008061, over 15275.00 frames. ], tot_loss[loss=0.06729, simple_loss=0.09058, pruned_loss=0.01277, audio_tagging_loss=0.009235, over 3044813.25 frames. ], batch size: 60, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:23:59,032 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=5.016e-03 2023-11-25 22:23:59,366 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.67 vs. limit=15.0 2023-11-25 22:24:12,784 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3090086.6666666665, ans=0.1 2023-11-25 22:24:25,009 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.55 vs. limit=15.0 2023-11-25 22:24:35,435 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.99 vs. limit=15.0 2023-11-25 22:24:43,433 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 463550 2023-11-25 22:24:49,278 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 6650, loss[loss=0.08505, simple_loss=0.1075, pruned_loss=0.02424, audio_tagging_loss=0.00704, over 15607.00 frames. ], tot_loss[loss=0.06663, simple_loss=0.0895, pruned_loss=0.01267, audio_tagging_loss=0.009217, over 3041541.88 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:25:07,852 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3090420.0, ans=0.0 2023-11-25 22:25:21,237 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3090553.3333333335, ans=0.125 2023-11-25 22:25:27,776 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.165e+01 8.685e+01 9.321e+01 9.946e+01 1.416e+02, threshold=1.864e+02, percent-clipped=0.0 2023-11-25 22:25:38,699 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 463600 2023-11-25 22:25:38,852 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3090620.0, ans=0.95 2023-11-25 22:25:44,052 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 6700, loss[loss=0.06661, simple_loss=0.08206, pruned_loss=0.01193, audio_tagging_loss=0.01365, over 15791.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.08958, pruned_loss=0.01254, audio_tagging_loss=0.00914, over 3042420.22 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:25:56,354 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3090753.3333333335, ans=0.1 2023-11-25 22:26:00,514 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3090753.3333333335, ans=0.125 2023-11-25 22:26:10,217 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.95 vs. limit=22.5 2023-11-25 22:26:28,044 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3090953.3333333335, ans=0.0 2023-11-25 22:26:29,066 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3090953.3333333335, ans=0.125 2023-11-25 22:26:33,554 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 463650 2023-11-25 22:26:33,703 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3090953.3333333335, ans=0.1 2023-11-25 22:26:38,721 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 6750, loss[loss=0.06657, simple_loss=0.08852, pruned_loss=0.01294, audio_tagging_loss=0.009372, over 14847.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.09019, pruned_loss=0.01259, audio_tagging_loss=0.009031, over 3045097.67 frames. ], batch size: 54, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:27:02,405 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3091153.3333333335, ans=0.125 2023-11-25 22:27:02,431 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3091153.3333333335, ans=0.1 2023-11-25 22:27:03,248 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3091153.3333333335, ans=0.125 2023-11-25 22:27:11,921 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3091220.0, ans=0.2 2023-11-25 22:27:15,991 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.50 vs. limit=22.5 2023-11-25 22:27:17,263 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.384e+01 8.613e+01 9.173e+01 9.716e+01 1.152e+02, threshold=1.835e+02, percent-clipped=0.0 2023-11-25 22:27:28,299 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 463700 2023-11-25 22:27:33,159 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3091353.3333333335, ans=0.125 2023-11-25 22:27:33,914 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 6800, loss[loss=0.07298, simple_loss=0.1056, pruned_loss=0.0107, audio_tagging_loss=0.009468, over 15194.00 frames. ], tot_loss[loss=0.06694, simple_loss=0.09061, pruned_loss=0.01264, audio_tagging_loss=0.008999, over 3043378.59 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 22:27:50,425 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3091420.0, ans=0.2 2023-11-25 22:27:57,264 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3091486.6666666665, ans=0.125 2023-11-25 22:28:05,558 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3091553.3333333335, ans=0.125 2023-11-25 22:28:13,993 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3091553.3333333335, ans=0.125 2023-11-25 22:28:15,094 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3091553.3333333335, ans=0.125 2023-11-25 22:28:15,199 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.71 vs. limit=22.5 2023-11-25 22:28:23,737 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 463750 2023-11-25 22:28:24,266 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.94 vs. limit=15.0 2023-11-25 22:28:25,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3091620.0, ans=0.0 2023-11-25 22:28:28,537 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.74 vs. limit=15.0 2023-11-25 22:28:28,852 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 6850, loss[loss=0.06874, simple_loss=0.08926, pruned_loss=0.01334, audio_tagging_loss=0.01077, over 15326.00 frames. ], tot_loss[loss=0.0669, simple_loss=0.09085, pruned_loss=0.01252, audio_tagging_loss=0.008956, over 3038950.18 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 22:28:36,322 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3091686.6666666665, ans=0.0 2023-11-25 22:28:37,470 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3091686.6666666665, ans=0.125 2023-11-25 22:28:42,127 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3091753.3333333335, ans=0.0 2023-11-25 22:28:45,378 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3091753.3333333335, ans=0.0 2023-11-25 22:28:53,189 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3091820.0, ans=0.125 2023-11-25 22:29:07,615 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.919e+01 8.654e+01 9.393e+01 1.015e+02 1.220e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-25 22:29:08,848 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3091886.6666666665, ans=0.0 2023-11-25 22:29:18,089 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 463800 2023-11-25 22:29:23,603 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 6900, loss[loss=0.06667, simple_loss=0.08926, pruned_loss=0.01465, audio_tagging_loss=0.007395, over 16825.00 frames. ], tot_loss[loss=0.06703, simple_loss=0.09081, pruned_loss=0.01261, audio_tagging_loss=0.009019, over 3039093.70 frames. ], batch size: 64, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 22:29:29,554 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3092020.0, ans=0.125 2023-11-25 22:29:50,295 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3092153.3333333335, ans=0.125 2023-11-25 22:30:08,456 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 22:30:13,785 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 463850 2023-11-25 22:30:20,046 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 6950, loss[loss=0.09072, simple_loss=0.1211, pruned_loss=0.02128, audio_tagging_loss=0.008878, over 16033.00 frames. ], tot_loss[loss=0.06725, simple_loss=0.091, pruned_loss=0.01277, audio_tagging_loss=0.008979, over 3040778.36 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 22:30:34,297 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3092420.0, ans=0.0 2023-11-25 22:30:34,678 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.49 vs. limit=22.5 2023-11-25 22:30:38,516 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-25 22:30:43,879 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3092486.6666666665, ans=0.1 2023-11-25 22:30:45,925 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3092486.6666666665, ans=0.125 2023-11-25 22:30:53,958 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3092553.3333333335, ans=0.125 2023-11-25 22:30:58,402 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.370e+01 8.697e+01 9.205e+01 9.794e+01 1.442e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-25 22:31:04,371 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3092620.0, ans=0.1 2023-11-25 22:31:08,927 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3092620.0, ans=0.125 2023-11-25 22:31:09,789 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 463900 2023-11-25 22:31:15,081 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 7000, loss[loss=0.06532, simple_loss=0.09137, pruned_loss=0.0124, audio_tagging_loss=0.007234, over 14611.00 frames. ], tot_loss[loss=0.06712, simple_loss=0.0908, pruned_loss=0.01268, audio_tagging_loss=0.009036, over 3034896.07 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 22:31:23,431 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3092686.6666666665, ans=0.04949747468305833 2023-11-25 22:32:00,341 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3092953.3333333335, ans=0.1 2023-11-25 22:32:04,329 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 463950 2023-11-25 22:32:09,464 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 7050, loss[loss=0.06458, simple_loss=0.08164, pruned_loss=0.0117, audio_tagging_loss=0.01206, over 14548.00 frames. ], tot_loss[loss=0.06696, simple_loss=0.09053, pruned_loss=0.01264, audio_tagging_loss=0.009051, over 3036487.12 frames. ], batch size: 54, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 22:32:25,836 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3093086.6666666665, ans=0.0 2023-11-25 22:32:48,073 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.379e+01 8.460e+01 9.019e+01 9.979e+01 1.338e+02, threshold=1.804e+02, percent-clipped=0.0 2023-11-25 22:32:50,548 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3093220.0, ans=0.1 2023-11-25 22:32:54,119 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.97 vs. limit=15.0 2023-11-25 22:32:58,705 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 464000 2023-11-25 22:33:00,031 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-464000.pt 2023-11-25 22:33:07,450 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 7100, loss[loss=0.1022, simple_loss=0.1317, pruned_loss=0.02811, audio_tagging_loss=0.008215, over 16313.00 frames. ], tot_loss[loss=0.0668, simple_loss=0.09005, pruned_loss=0.01261, audio_tagging_loss=0.009161, over 3038374.52 frames. ], batch size: 60, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 22:33:16,435 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3093353.3333333335, ans=0.1 2023-11-25 22:33:44,396 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3093553.3333333335, ans=0.125 2023-11-25 22:33:57,208 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 464050 2023-11-25 22:34:02,475 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 7150, loss[loss=0.07039, simple_loss=0.09698, pruned_loss=0.01356, audio_tagging_loss=0.008346, over 14934.00 frames. ], tot_loss[loss=0.06664, simple_loss=0.0898, pruned_loss=0.01256, audio_tagging_loss=0.009181, over 3037175.62 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 22:34:40,746 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.230e+01 8.669e+01 9.271e+01 1.002e+02 1.351e+02, threshold=1.854e+02, percent-clipped=0.0 2023-11-25 22:34:51,298 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 464100 2023-11-25 22:34:56,587 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 7200, loss[loss=0.07097, simple_loss=0.09335, pruned_loss=0.01544, audio_tagging_loss=0.008851, over 14877.00 frames. ], tot_loss[loss=0.06746, simple_loss=0.09079, pruned_loss=0.0128, audio_tagging_loss=0.009259, over 3044263.97 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 22:35:11,392 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3094086.6666666665, ans=0.0 2023-11-25 22:35:23,322 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.16 vs. limit=15.0 2023-11-25 22:35:45,906 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 464150 2023-11-25 22:35:51,593 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 7250, loss[loss=0.07095, simple_loss=0.09556, pruned_loss=0.01409, audio_tagging_loss=0.009076, over 15126.00 frames. ], tot_loss[loss=0.06778, simple_loss=0.09135, pruned_loss=0.01284, audio_tagging_loss=0.009268, over 3042492.86 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:35:57,100 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3094353.3333333335, ans=0.1 2023-11-25 22:35:59,228 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3094353.3333333335, ans=0.1 2023-11-25 22:36:21,189 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.03 vs. limit=10.0 2023-11-25 22:36:28,219 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3094553.3333333335, ans=0.125 2023-11-25 22:36:28,269 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3094553.3333333335, ans=0.125 2023-11-25 22:36:28,562 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.78 vs. limit=15.0 2023-11-25 22:36:31,152 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.617e+01 8.827e+01 9.307e+01 1.005e+02 1.461e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-25 22:36:33,902 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3094553.3333333335, ans=0.125 2023-11-25 22:36:42,629 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 464200 2023-11-25 22:36:48,011 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 7300, loss[loss=0.04137, simple_loss=0.05173, pruned_loss=0.006286, audio_tagging_loss=0.009223, over 13977.00 frames. ], tot_loss[loss=0.06716, simple_loss=0.09043, pruned_loss=0.01275, audio_tagging_loss=0.009196, over 3042436.85 frames. ], batch size: 53, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:37:05,906 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3094753.3333333335, ans=0.0 2023-11-25 22:37:09,120 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=3094820.0, ans=0.02 2023-11-25 22:37:27,812 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3094886.6666666665, ans=0.0 2023-11-25 22:37:35,238 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3094953.3333333335, ans=0.125 2023-11-25 22:37:37,125 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 464250 2023-11-25 22:37:37,520 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.36 vs. limit=15.0 2023-11-25 22:37:41,459 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3095020.0, ans=0.1 2023-11-25 22:37:42,318 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 7350, loss[loss=0.06114, simple_loss=0.07876, pruned_loss=0.009635, audio_tagging_loss=0.01213, over 14764.00 frames. ], tot_loss[loss=0.06684, simple_loss=0.09032, pruned_loss=0.01267, audio_tagging_loss=0.009005, over 3047380.91 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:37:42,614 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3095020.0, ans=0.125 2023-11-25 22:37:43,632 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3095020.0, ans=0.0 2023-11-25 22:37:52,828 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3095086.6666666665, ans=0.1 2023-11-25 22:38:05,313 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.26 vs. limit=15.0 2023-11-25 22:38:15,927 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3095220.0, ans=0.125 2023-11-25 22:38:23,149 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.376e+01 8.699e+01 9.551e+01 1.020e+02 2.458e+02, threshold=1.910e+02, percent-clipped=1.0 2023-11-25 22:38:31,660 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 464300 2023-11-25 22:38:36,015 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3095353.3333333335, ans=0.125 2023-11-25 22:38:36,942 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 7400, loss[loss=0.06274, simple_loss=0.0852, pruned_loss=0.009742, audio_tagging_loss=0.0104, over 14060.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.08978, pruned_loss=0.01256, audio_tagging_loss=0.008973, over 3044388.56 frames. ], batch size: 53, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:38:42,490 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3095353.3333333335, ans=0.07 2023-11-25 22:38:49,323 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3095420.0, ans=0.2 2023-11-25 22:38:58,738 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3095486.6666666665, ans=0.0 2023-11-25 22:39:07,310 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.66 vs. limit=15.0 2023-11-25 22:39:09,081 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3095553.3333333335, ans=0.1 2023-11-25 22:39:13,307 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3095553.3333333335, ans=0.125 2023-11-25 22:39:16,485 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3095553.3333333335, ans=0.1 2023-11-25 22:39:19,618 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3095620.0, ans=0.125 2023-11-25 22:39:26,764 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 464350 2023-11-25 22:39:31,455 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.74 vs. limit=12.0 2023-11-25 22:39:32,920 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 7450, loss[loss=0.06232, simple_loss=0.08979, pruned_loss=0.01138, audio_tagging_loss=0.006042, over 15418.00 frames. ], tot_loss[loss=0.0668, simple_loss=0.09032, pruned_loss=0.01275, audio_tagging_loss=0.008893, over 3049274.17 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:39:45,667 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3095753.3333333335, ans=0.0 2023-11-25 22:40:04,287 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3095886.6666666665, ans=0.125 2023-11-25 22:40:13,435 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.982e+01 8.803e+01 9.393e+01 1.013e+02 1.307e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-25 22:40:14,791 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3095886.6666666665, ans=0.1 2023-11-25 22:40:16,930 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3095953.3333333335, ans=0.07 2023-11-25 22:40:21,923 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 464400 2023-11-25 22:40:26,562 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3096020.0, ans=0.1 2023-11-25 22:40:27,442 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 7500, loss[loss=0.06901, simple_loss=0.09167, pruned_loss=0.01198, audio_tagging_loss=0.0112, over 16714.00 frames. ], tot_loss[loss=0.06721, simple_loss=0.09106, pruned_loss=0.01281, audio_tagging_loss=0.008864, over 3057256.62 frames. ], batch size: 62, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:40:31,039 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.13 vs. limit=15.0 2023-11-25 22:40:36,413 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.23 vs. limit=15.0 2023-11-25 22:40:59,156 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3096153.3333333335, ans=0.0 2023-11-25 22:41:16,964 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 464450 2023-11-25 22:41:22,276 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 7550, loss[loss=0.06706, simple_loss=0.1015, pruned_loss=0.01102, audio_tagging_loss=0.005302, over 15384.00 frames. ], tot_loss[loss=0.0669, simple_loss=0.0909, pruned_loss=0.01258, audio_tagging_loss=0.008867, over 3056302.71 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:42:01,059 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3096553.3333333335, ans=0.1 2023-11-25 22:42:02,995 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.821e+01 8.728e+01 9.410e+01 1.018e+02 1.180e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-25 22:42:03,222 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3096553.3333333335, ans=0.0 2023-11-25 22:42:12,525 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 464500 2023-11-25 22:42:18,119 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 7600, loss[loss=0.04422, simple_loss=0.04997, pruned_loss=0.009136, audio_tagging_loss=0.0101, over 14323.00 frames. ], tot_loss[loss=0.0667, simple_loss=0.0907, pruned_loss=0.01247, audio_tagging_loss=0.008873, over 3053731.82 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:42:25,132 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3096686.6666666665, ans=0.125 2023-11-25 22:42:32,593 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3096753.3333333335, ans=0.09899494936611666 2023-11-25 22:42:35,803 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3096753.3333333335, ans=0.0 2023-11-25 22:42:38,902 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3096820.0, ans=0.07 2023-11-25 22:42:45,104 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3096820.0, ans=0.0 2023-11-25 22:43:00,117 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3096886.6666666665, ans=0.2 2023-11-25 22:43:06,983 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3096953.3333333335, ans=0.125 2023-11-25 22:43:07,838 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 464550 2023-11-25 22:43:13,160 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 7650, loss[loss=0.06231, simple_loss=0.08535, pruned_loss=0.01188, audio_tagging_loss=0.007756, over 16256.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.09072, pruned_loss=0.01258, audio_tagging_loss=0.008921, over 3055527.74 frames. ], batch size: 62, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:43:13,349 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3097020.0, ans=0.125 2023-11-25 22:43:14,482 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3097020.0, ans=0.2 2023-11-25 22:43:16,492 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3097020.0, ans=0.0 2023-11-25 22:43:26,030 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=3097086.6666666665, ans=0.05 2023-11-25 22:43:37,417 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3097153.3333333335, ans=0.0 2023-11-25 22:43:46,369 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3097220.0, ans=0.125 2023-11-25 22:43:52,029 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3097220.0, ans=0.125 2023-11-25 22:43:55,112 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.389e+01 8.616e+01 9.118e+01 9.857e+01 1.270e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-25 22:43:59,822 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.83 vs. limit=15.0 2023-11-25 22:44:01,638 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3097286.6666666665, ans=0.1 2023-11-25 22:44:02,522 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 464600 2023-11-25 22:44:08,206 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 7700, loss[loss=0.08325, simple_loss=0.1081, pruned_loss=0.01831, audio_tagging_loss=0.01089, over 15422.00 frames. ], tot_loss[loss=0.06711, simple_loss=0.09099, pruned_loss=0.01271, audio_tagging_loss=0.008903, over 3052636.57 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:44:22,590 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3097420.0, ans=0.2 2023-11-25 22:44:23,510 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3097420.0, ans=0.2 2023-11-25 22:44:24,894 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.91 vs. limit=22.5 2023-11-25 22:44:25,728 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3097420.0, ans=0.125 2023-11-25 22:44:46,156 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3097553.3333333335, ans=0.1 2023-11-25 22:44:58,674 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 464650 2023-11-25 22:45:04,317 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 7750, loss[loss=0.09045, simple_loss=0.1198, pruned_loss=0.02304, audio_tagging_loss=0.007501, over 15285.00 frames. ], tot_loss[loss=0.06706, simple_loss=0.09081, pruned_loss=0.01265, audio_tagging_loss=0.008999, over 3045859.05 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:45:05,852 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.72 vs. limit=15.0 2023-11-25 22:45:08,966 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.06 vs. limit=15.0 2023-11-25 22:45:22,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3097753.3333333335, ans=0.125 2023-11-25 22:45:31,584 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.20 vs. limit=6.0 2023-11-25 22:45:32,249 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3097820.0, ans=0.1 2023-11-25 22:45:46,215 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.373e+01 8.755e+01 9.240e+01 9.987e+01 1.306e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-25 22:45:53,739 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 464700 2023-11-25 22:45:57,443 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.94 vs. limit=12.0 2023-11-25 22:45:59,398 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 7800, loss[loss=0.06571, simple_loss=0.09192, pruned_loss=0.01095, audio_tagging_loss=0.008803, over 16431.00 frames. ], tot_loss[loss=0.06693, simple_loss=0.0905, pruned_loss=0.01264, audio_tagging_loss=0.009046, over 3047164.59 frames. ], batch size: 61, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:46:05,785 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3098020.0, ans=0.125 2023-11-25 22:46:40,181 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3098220.0, ans=0.2 2023-11-25 22:46:44,000 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.86 vs. limit=15.0 2023-11-25 22:46:48,775 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 464750 2023-11-25 22:46:51,139 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3098286.6666666665, ans=0.125 2023-11-25 22:46:54,005 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 7850, loss[loss=0.05906, simple_loss=0.08356, pruned_loss=0.01068, audio_tagging_loss=0.0066, over 15073.00 frames. ], tot_loss[loss=0.06682, simple_loss=0.09007, pruned_loss=0.01261, audio_tagging_loss=0.009171, over 3049161.58 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:46:58,875 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.38 vs. limit=6.0 2023-11-25 22:47:31,907 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3098553.3333333335, ans=0.0 2023-11-25 22:47:35,821 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.757e+01 8.791e+01 9.341e+01 1.014e+02 1.334e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-25 22:47:43,169 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 464800 2023-11-25 22:47:49,382 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 7900, loss[loss=0.06151, simple_loss=0.08236, pruned_loss=0.009689, audio_tagging_loss=0.01064, over 14759.00 frames. ], tot_loss[loss=0.06682, simple_loss=0.09003, pruned_loss=0.01257, audio_tagging_loss=0.009242, over 3052403.45 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:47:49,591 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3098686.6666666665, ans=0.0 2023-11-25 22:47:56,782 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.91 vs. limit=12.0 2023-11-25 22:47:57,895 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.98 vs. limit=15.0 2023-11-25 22:48:05,721 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3098753.3333333335, ans=0.125 2023-11-25 22:48:08,695 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.25 vs. limit=15.0 2023-11-25 22:48:31,496 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3098886.6666666665, ans=0.0 2023-11-25 22:48:39,183 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 464850 2023-11-25 22:48:44,373 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 7950, loss[loss=0.05121, simple_loss=0.06379, pruned_loss=0.007836, audio_tagging_loss=0.01148, over 14585.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.08905, pruned_loss=0.01235, audio_tagging_loss=0.009386, over 3046457.71 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:48:46,004 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.07 vs. limit=15.0 2023-11-25 22:48:58,467 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 22:49:03,933 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3099086.6666666665, ans=0.09899494936611666 2023-11-25 22:49:04,871 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3099153.3333333335, ans=0.1 2023-11-25 22:49:06,393 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.18 vs. limit=22.5 2023-11-25 22:49:13,283 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3099153.3333333335, ans=0.2 2023-11-25 22:49:15,977 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3099153.3333333335, ans=0.0 2023-11-25 22:49:15,995 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3099153.3333333335, ans=0.125 2023-11-25 22:49:19,078 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3099220.0, ans=0.125 2023-11-25 22:49:20,113 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3099220.0, ans=0.125 2023-11-25 22:49:26,278 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.637e+01 8.696e+01 9.334e+01 1.006e+02 1.500e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-25 22:49:34,213 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 464900 2023-11-25 22:49:39,380 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 8000, loss[loss=0.03433, simple_loss=0.04099, pruned_loss=0.00334, audio_tagging_loss=0.0105, over 14734.00 frames. ], tot_loss[loss=0.066, simple_loss=0.0884, pruned_loss=0.01225, audio_tagging_loss=0.009549, over 3037632.32 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:49:47,040 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3099353.3333333335, ans=0.125 2023-11-25 22:50:28,810 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 464950 2023-11-25 22:50:34,966 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 8050, loss[loss=0.06823, simple_loss=0.09252, pruned_loss=0.01336, audio_tagging_loss=0.008615, over 15201.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.08866, pruned_loss=0.01235, audio_tagging_loss=0.00958, over 3043616.91 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:50:41,044 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 22:50:45,126 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3099753.3333333335, ans=0.0 2023-11-25 22:50:56,642 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-25 22:51:06,100 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.60 vs. limit=15.0 2023-11-25 22:51:10,812 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3099886.6666666665, ans=0.125 2023-11-25 22:51:13,195 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.22 vs. limit=10.0 2023-11-25 22:51:14,384 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.30 vs. limit=10.0 2023-11-25 22:51:15,569 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3099886.6666666665, ans=0.125 2023-11-25 22:51:16,987 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.196e+01 8.622e+01 9.226e+01 9.839e+01 1.205e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-25 22:51:17,183 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3099886.6666666665, ans=0.035 2023-11-25 22:51:18,848 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.74 vs. limit=15.0 2023-11-25 22:51:19,388 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3099953.3333333335, ans=0.07 2023-11-25 22:51:24,907 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 465000 2023-11-25 22:51:30,382 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 8100, loss[loss=0.07432, simple_loss=0.1116, pruned_loss=0.01166, audio_tagging_loss=0.006837, over 15122.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.08948, pruned_loss=0.01261, audio_tagging_loss=0.009373, over 3039363.24 frames. ], batch size: 54, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:51:47,029 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.49 vs. limit=22.5 2023-11-25 22:52:18,962 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.71 vs. limit=6.0 2023-11-25 22:52:19,610 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 465050 2023-11-25 22:52:24,794 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 8150, loss[loss=0.07029, simple_loss=0.1049, pruned_loss=0.0103, audio_tagging_loss=0.007542, over 16315.00 frames. ], tot_loss[loss=0.06758, simple_loss=0.09124, pruned_loss=0.0129, audio_tagging_loss=0.009061, over 3050680.83 frames. ], batch size: 60, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:52:26,173 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3100353.3333333335, ans=0.0 2023-11-25 22:52:28,711 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3100353.3333333335, ans=0.0 2023-11-25 22:52:30,752 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3100353.3333333335, ans=0.0 2023-11-25 22:53:00,683 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3100553.3333333335, ans=0.125 2023-11-25 22:53:02,813 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3100553.3333333335, ans=0.2 2023-11-25 22:53:06,750 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.378e+01 8.506e+01 9.069e+01 1.015e+02 1.632e+02, threshold=1.814e+02, percent-clipped=0.0 2023-11-25 22:53:14,725 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 465100 2023-11-25 22:53:20,588 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 8200, loss[loss=0.0712, simple_loss=0.1045, pruned_loss=0.01189, audio_tagging_loss=0.00705, over 14918.00 frames. ], tot_loss[loss=0.0674, simple_loss=0.09118, pruned_loss=0.01282, audio_tagging_loss=0.008994, over 3047118.30 frames. ], batch size: 53, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:53:22,089 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=3100686.6666666665, ans=15.0 2023-11-25 22:53:23,156 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 22:53:23,275 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3100686.6666666665, ans=0.125 2023-11-25 22:53:31,526 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.25 vs. limit=15.0 2023-11-25 22:53:34,418 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3100753.3333333335, ans=0.125 2023-11-25 22:53:39,946 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.37 vs. limit=15.0 2023-11-25 22:53:40,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3100753.3333333335, ans=0.2 2023-11-25 22:53:44,896 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3100820.0, ans=0.125 2023-11-25 22:53:56,841 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3100886.6666666665, ans=0.0 2023-11-25 22:54:07,878 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.19 vs. limit=15.0 2023-11-25 22:54:11,023 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 465150 2023-11-25 22:54:16,192 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 8250, loss[loss=0.0739, simple_loss=0.0997, pruned_loss=0.01472, audio_tagging_loss=0.009332, over 15304.00 frames. ], tot_loss[loss=0.06803, simple_loss=0.09197, pruned_loss=0.01313, audio_tagging_loss=0.008915, over 3050856.46 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:54:27,860 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3101086.6666666665, ans=0.0 2023-11-25 22:54:58,439 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.486e+01 8.606e+01 9.259e+01 1.021e+02 1.240e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-25 22:55:02,843 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3101286.6666666665, ans=0.125 2023-11-25 22:55:04,737 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 465200 2023-11-25 22:55:06,226 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3101286.6666666665, ans=0.035 2023-11-25 22:55:10,185 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 8300, loss[loss=0.05579, simple_loss=0.06854, pruned_loss=0.008841, audio_tagging_loss=0.01269, over 15243.00 frames. ], tot_loss[loss=0.06771, simple_loss=0.09169, pruned_loss=0.01294, audio_tagging_loss=0.008924, over 3049623.33 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:55:36,989 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3101486.6666666665, ans=0.0 2023-11-25 22:55:53,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3101620.0, ans=0.125 2023-11-25 22:55:58,982 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 465250 2023-11-25 22:56:00,211 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3101620.0, ans=0.1 2023-11-25 22:56:03,817 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3101686.6666666665, ans=0.125 2023-11-25 22:56:04,592 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 8350, loss[loss=0.07266, simple_loss=0.104, pruned_loss=0.01293, audio_tagging_loss=0.007717, over 15199.00 frames. ], tot_loss[loss=0.068, simple_loss=0.09212, pruned_loss=0.01312, audio_tagging_loss=0.008822, over 3046963.75 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:56:09,589 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3101686.6666666665, ans=0.125 2023-11-25 22:56:19,817 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.96 vs. limit=10.0 2023-11-25 22:56:31,243 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.26 vs. limit=10.0 2023-11-25 22:56:46,892 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.088e+01 8.512e+01 9.293e+01 1.012e+02 1.242e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-25 22:56:47,089 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3101953.3333333335, ans=0.125 2023-11-25 22:56:54,240 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 465300 2023-11-25 22:56:59,880 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 8400, loss[loss=0.06257, simple_loss=0.08321, pruned_loss=0.01299, audio_tagging_loss=0.007964, over 15165.00 frames. ], tot_loss[loss=0.06723, simple_loss=0.09081, pruned_loss=0.01292, audio_tagging_loss=0.008901, over 3050732.64 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:57:00,066 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3102020.0, ans=0.125 2023-11-25 22:57:10,703 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3102086.6666666665, ans=0.0 2023-11-25 22:57:24,458 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.52 vs. limit=15.0 2023-11-25 22:57:48,466 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 465350 2023-11-25 22:57:53,639 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 8450, loss[loss=0.05915, simple_loss=0.07764, pruned_loss=0.009618, audio_tagging_loss=0.01071, over 14364.00 frames. ], tot_loss[loss=0.06737, simple_loss=0.0911, pruned_loss=0.01289, audio_tagging_loss=0.008933, over 3051292.94 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:57:58,134 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3102353.3333333335, ans=0.0 2023-11-25 22:58:09,585 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3102420.0, ans=0.125 2023-11-25 22:58:31,997 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3102553.3333333335, ans=0.0 2023-11-25 22:58:35,876 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.363e+01 8.915e+01 9.393e+01 9.975e+01 1.301e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-25 22:58:36,357 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.72 vs. limit=15.0 2023-11-25 22:58:42,292 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 465400 2023-11-25 22:58:47,827 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 8500, loss[loss=0.06315, simple_loss=0.08423, pruned_loss=0.01192, audio_tagging_loss=0.009115, over 15264.00 frames. ], tot_loss[loss=0.06762, simple_loss=0.09127, pruned_loss=0.01291, audio_tagging_loss=0.009077, over 3055888.78 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:58:52,697 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3102686.6666666665, ans=0.1 2023-11-25 22:59:37,815 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 465450 2023-11-25 22:59:43,589 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 8550, loss[loss=0.04865, simple_loss=0.06551, pruned_loss=0.006756, audio_tagging_loss=0.009144, over 15951.00 frames. ], tot_loss[loss=0.0675, simple_loss=0.09109, pruned_loss=0.01289, audio_tagging_loss=0.009064, over 3053782.25 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:59:59,727 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.36 vs. limit=22.5 2023-11-25 23:00:03,427 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3103153.3333333335, ans=0.0 2023-11-25 23:00:14,924 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3103220.0, ans=0.95 2023-11-25 23:00:17,584 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-25 23:00:17,625 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3103220.0, ans=0.125 2023-11-25 23:00:22,392 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3103220.0, ans=0.125 2023-11-25 23:00:25,865 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.126e+01 8.599e+01 9.050e+01 9.776e+01 1.276e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-25 23:00:32,217 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 465500 2023-11-25 23:00:35,519 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3103286.6666666665, ans=0.125 2023-11-25 23:00:37,341 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 8600, loss[loss=0.05658, simple_loss=0.07915, pruned_loss=0.009166, audio_tagging_loss=0.007837, over 15219.00 frames. ], tot_loss[loss=0.06706, simple_loss=0.0905, pruned_loss=0.01268, audio_tagging_loss=0.009133, over 3045706.04 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:00:45,863 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3103353.3333333335, ans=0.125 2023-11-25 23:00:51,061 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3103420.0, ans=0.125 2023-11-25 23:01:26,012 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 465550 2023-11-25 23:01:31,147 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 8650, loss[loss=0.04782, simple_loss=0.06549, pruned_loss=0.007253, audio_tagging_loss=0.007825, over 14658.00 frames. ], tot_loss[loss=0.06744, simple_loss=0.09102, pruned_loss=0.01272, audio_tagging_loss=0.009203, over 3043612.97 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:02:00,138 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3103820.0, ans=0.2 2023-11-25 23:02:08,481 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3103886.6666666665, ans=0.125 2023-11-25 23:02:13,488 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.107e+01 8.616e+01 9.272e+01 9.852e+01 1.304e+02, threshold=1.854e+02, percent-clipped=0.0 2023-11-25 23:02:14,713 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3103953.3333333335, ans=0.125 2023-11-25 23:02:20,398 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 465600 2023-11-25 23:02:21,812 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3103953.3333333335, ans=0.125 2023-11-25 23:02:25,782 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 8700, loss[loss=0.07297, simple_loss=0.09339, pruned_loss=0.01271, audio_tagging_loss=0.01356, over 15774.00 frames. ], tot_loss[loss=0.0675, simple_loss=0.09115, pruned_loss=0.01274, audio_tagging_loss=0.009191, over 3045908.78 frames. ], batch size: 60, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:02:31,282 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3104020.0, ans=0.2 2023-11-25 23:02:34,367 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3104020.0, ans=0.125 2023-11-25 23:02:37,493 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3104086.6666666665, ans=0.125 2023-11-25 23:02:38,528 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3104086.6666666665, ans=0.09899494936611666 2023-11-25 23:02:47,835 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3104153.3333333335, ans=0.2 2023-11-25 23:03:15,438 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 465650 2023-11-25 23:03:20,400 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.42 vs. limit=5.0 2023-11-25 23:03:20,575 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 8750, loss[loss=0.0706, simple_loss=0.1031, pruned_loss=0.01269, audio_tagging_loss=0.00634, over 15990.00 frames. ], tot_loss[loss=0.06781, simple_loss=0.09152, pruned_loss=0.01278, audio_tagging_loss=0.009264, over 3047612.54 frames. ], batch size: 61, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:03:22,925 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3104353.3333333335, ans=10.0 2023-11-25 23:03:28,166 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3104353.3333333335, ans=0.125 2023-11-25 23:03:30,268 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3104420.0, ans=0.2 2023-11-25 23:03:53,962 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3104553.3333333335, ans=10.0 2023-11-25 23:03:54,272 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.79 vs. limit=15.0 2023-11-25 23:03:59,083 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3104553.3333333335, ans=0.2 2023-11-25 23:04:03,117 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.234e+01 8.696e+01 9.362e+01 9.858e+01 1.375e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-25 23:04:07,551 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3104620.0, ans=0.125 2023-11-25 23:04:09,616 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 465700 2023-11-25 23:04:14,839 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 8800, loss[loss=0.07042, simple_loss=0.08793, pruned_loss=0.01544, audio_tagging_loss=0.01102, over 15664.00 frames. ], tot_loss[loss=0.06899, simple_loss=0.09307, pruned_loss=0.01318, audio_tagging_loss=0.009277, over 3051083.89 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 23:04:39,391 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.92 vs. limit=15.0 2023-11-25 23:04:43,339 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3104820.0, ans=0.5 2023-11-25 23:04:51,195 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.07 vs. limit=22.5 2023-11-25 23:04:51,756 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3104886.6666666665, ans=0.2 2023-11-25 23:05:04,348 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 465750 2023-11-25 23:05:04,552 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3104953.3333333335, ans=0.2 2023-11-25 23:05:09,858 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3105020.0, ans=0.1 2023-11-25 23:05:10,610 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 8850, loss[loss=0.06382, simple_loss=0.08271, pruned_loss=0.01121, audio_tagging_loss=0.01125, over 15488.00 frames. ], tot_loss[loss=0.06839, simple_loss=0.09246, pruned_loss=0.01288, audio_tagging_loss=0.009279, over 3055559.65 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 23:05:15,558 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3105020.0, ans=0.125 2023-11-25 23:05:16,683 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3105020.0, ans=0.07 2023-11-25 23:05:17,914 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.52 vs. limit=22.5 2023-11-25 23:05:23,103 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 23:05:25,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3105086.6666666665, ans=0.1 2023-11-25 23:05:35,002 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3105153.3333333335, ans=0.125 2023-11-25 23:05:53,532 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.603e+01 8.482e+01 9.169e+01 1.001e+02 1.243e+02, threshold=1.834e+02, percent-clipped=0.0 2023-11-25 23:05:57,490 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3105286.6666666665, ans=0.0 2023-11-25 23:06:00,504 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 465800 2023-11-25 23:06:06,438 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 8900, loss[loss=0.07597, simple_loss=0.09854, pruned_loss=0.01665, audio_tagging_loss=0.01005, over 14102.00 frames. ], tot_loss[loss=0.06817, simple_loss=0.09226, pruned_loss=0.01294, audio_tagging_loss=0.009096, over 3048716.20 frames. ], batch size: 54, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 23:06:19,678 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.00 vs. limit=15.0 2023-11-25 23:06:23,337 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3105420.0, ans=0.2 2023-11-25 23:06:23,341 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3105420.0, ans=0.125 2023-11-25 23:06:35,693 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3105486.6666666665, ans=0.0 2023-11-25 23:06:44,283 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.81 vs. limit=6.0 2023-11-25 23:06:55,571 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 465850 2023-11-25 23:06:55,745 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3105620.0, ans=0.025 2023-11-25 23:06:56,778 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3105620.0, ans=0.125 2023-11-25 23:07:00,871 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 8950, loss[loss=0.05529, simple_loss=0.06932, pruned_loss=0.01258, audio_tagging_loss=0.008047, over 15778.00 frames. ], tot_loss[loss=0.06769, simple_loss=0.09154, pruned_loss=0.01289, audio_tagging_loss=0.009028, over 3052241.75 frames. ], batch size: 60, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 23:07:06,420 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3105686.6666666665, ans=0.125 2023-11-25 23:07:07,861 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.51 vs. limit=15.0 2023-11-25 23:07:15,305 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.78 vs. limit=22.5 2023-11-25 23:07:34,828 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3105886.6666666665, ans=0.125 2023-11-25 23:07:43,926 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.330e+01 8.637e+01 9.614e+01 1.032e+02 1.612e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-25 23:07:50,281 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 465900 2023-11-25 23:07:56,487 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 9000, loss[loss=0.05415, simple_loss=0.07086, pruned_loss=0.009943, audio_tagging_loss=0.008774, over 15074.00 frames. ], tot_loss[loss=0.06735, simple_loss=0.09147, pruned_loss=0.01271, audio_tagging_loss=0.008906, over 3053627.39 frames. ], batch size: 60, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 23:07:56,492 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-25 23:08:13,725 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.2924, 4.2867, 4.5071, 4.4511], device='cuda:0') 2023-11-25 23:08:17,487 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([6.0025, 5.8417, 5.6134, 5.5608], device='cuda:0') 2023-11-25 23:08:28,217 INFO [train_asr.py:1267] (0/4) Epoch 39, validation: loss=0.05899, simple_loss=0.0507, pruned_loss=0.005227, audio_tagging_loss=0.02841, over 4681554.00 frames. 2023-11-25 23:08:28,218 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-25 23:08:32,042 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3106020.0, ans=0.125 2023-11-25 23:09:05,996 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=3106220.0, ans=0.95 2023-11-25 23:09:16,972 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3106286.6666666665, ans=0.125 2023-11-25 23:09:17,825 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 465950 2023-11-25 23:09:20,245 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.00 vs. limit=15.0 2023-11-25 23:09:22,965 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 9050, loss[loss=0.06435, simple_loss=0.08322, pruned_loss=0.01364, audio_tagging_loss=0.009102, over 13989.00 frames. ], tot_loss[loss=0.06706, simple_loss=0.09108, pruned_loss=0.01267, audio_tagging_loss=0.008846, over 3048758.96 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:09:24,429 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.04 vs. limit=15.0 2023-11-25 23:09:30,443 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3106353.3333333335, ans=0.125 2023-11-25 23:09:35,812 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-25 23:09:52,916 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3106486.6666666665, ans=0.125 2023-11-25 23:09:54,732 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3106486.6666666665, ans=0.125 2023-11-25 23:09:59,846 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3106553.3333333335, ans=0.125 2023-11-25 23:10:02,023 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3106553.3333333335, ans=0.125 2023-11-25 23:10:07,084 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.183e+01 8.869e+01 9.445e+01 1.003e+02 1.420e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-25 23:10:09,787 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.93 vs. limit=15.0 2023-11-25 23:10:10,518 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3106620.0, ans=0.125 2023-11-25 23:10:12,412 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 466000 2023-11-25 23:10:18,646 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 9100, loss[loss=0.07384, simple_loss=0.1074, pruned_loss=0.01221, audio_tagging_loss=0.007956, over 15455.00 frames. ], tot_loss[loss=0.06687, simple_loss=0.09088, pruned_loss=0.01262, audio_tagging_loss=0.008813, over 3057057.99 frames. ], batch size: 54, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:10:27,020 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.96 vs. limit=15.0 2023-11-25 23:10:40,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3106820.0, ans=0.04949747468305833 2023-11-25 23:10:42,304 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.36 vs. limit=15.0 2023-11-25 23:11:07,400 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3106953.3333333335, ans=0.2 2023-11-25 23:11:08,266 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 466050 2023-11-25 23:11:13,486 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 9150, loss[loss=0.06032, simple_loss=0.08138, pruned_loss=0.009389, audio_tagging_loss=0.01024, over 14306.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.09003, pruned_loss=0.01257, audio_tagging_loss=0.00882, over 3050554.58 frames. ], batch size: 54, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:11:14,132 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.16 vs. limit=6.0 2023-11-25 23:11:15,705 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3107020.0, ans=0.125 2023-11-25 23:11:23,486 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3107086.6666666665, ans=0.0 2023-11-25 23:11:26,977 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.11 vs. limit=22.5 2023-11-25 23:11:44,122 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3107153.3333333335, ans=0.2 2023-11-25 23:11:56,886 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.260e+01 8.490e+01 9.148e+01 9.794e+01 1.489e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-25 23:12:02,213 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 466100 2023-11-25 23:12:04,267 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.41 vs. limit=15.0 2023-11-25 23:12:07,885 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 9200, loss[loss=0.05466, simple_loss=0.06919, pruned_loss=0.009438, audio_tagging_loss=0.01063, over 14369.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.08973, pruned_loss=0.0126, audio_tagging_loss=0.008774, over 3050578.70 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 23:12:24,600 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.44 vs. limit=15.0 2023-11-25 23:12:28,699 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3107420.0, ans=0.125 2023-11-25 23:12:32,755 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3107486.6666666665, ans=0.0 2023-11-25 23:12:42,104 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3107553.3333333335, ans=0.0 2023-11-25 23:12:49,860 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3107553.3333333335, ans=0.125 2023-11-25 23:12:49,972 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3107553.3333333335, ans=0.95 2023-11-25 23:12:56,108 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3107620.0, ans=0.1 2023-11-25 23:12:57,039 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 466150 2023-11-25 23:13:01,197 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.47 vs. limit=15.0 2023-11-25 23:13:03,216 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 9250, loss[loss=0.07918, simple_loss=0.1093, pruned_loss=0.01529, audio_tagging_loss=0.009256, over 15237.00 frames. ], tot_loss[loss=0.06668, simple_loss=0.08999, pruned_loss=0.01282, audio_tagging_loss=0.008875, over 3052261.57 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 23:13:06,583 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 23:13:08,646 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3107686.6666666665, ans=0.125 2023-11-25 23:13:30,781 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3107820.0, ans=0.125 2023-11-25 23:13:30,851 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3107820.0, ans=0.0 2023-11-25 23:13:46,850 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.067e+01 8.554e+01 9.246e+01 1.012e+02 1.216e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-25 23:13:52,724 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 466200 2023-11-25 23:13:58,079 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 9300, loss[loss=0.06906, simple_loss=0.103, pruned_loss=0.01007, audio_tagging_loss=0.007505, over 15693.00 frames. ], tot_loss[loss=0.06657, simple_loss=0.09004, pruned_loss=0.01263, audio_tagging_loss=0.00892, over 3064188.20 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 23:14:08,700 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=3108086.6666666665, ans=0.95 2023-11-25 23:14:08,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3108086.6666666665, ans=0.5 2023-11-25 23:14:09,792 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3108086.6666666665, ans=0.125 2023-11-25 23:14:22,964 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3108153.3333333335, ans=0.2 2023-11-25 23:14:39,062 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.54 vs. limit=6.0 2023-11-25 23:14:46,892 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 466250 2023-11-25 23:14:50,204 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3108286.6666666665, ans=0.09899494936611666 2023-11-25 23:14:52,126 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 9350, loss[loss=0.08179, simple_loss=0.111, pruned_loss=0.01874, audio_tagging_loss=0.007551, over 15293.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.09002, pruned_loss=0.01256, audio_tagging_loss=0.00897, over 3060412.77 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:15:00,185 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3108353.3333333335, ans=0.1 2023-11-25 23:15:07,118 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.52 vs. limit=10.0 2023-11-25 23:15:14,525 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3108486.6666666665, ans=0.125 2023-11-25 23:15:15,651 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3108486.6666666665, ans=0.0 2023-11-25 23:15:19,888 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3108486.6666666665, ans=0.95 2023-11-25 23:15:25,476 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.53 vs. limit=22.5 2023-11-25 23:15:36,370 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.736e+01 8.512e+01 9.083e+01 9.779e+01 1.171e+02, threshold=1.817e+02, percent-clipped=0.0 2023-11-25 23:15:39,191 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3108620.0, ans=0.1 2023-11-25 23:15:41,154 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 466300 2023-11-25 23:15:46,820 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 9400, loss[loss=0.06229, simple_loss=0.08481, pruned_loss=0.01162, audio_tagging_loss=0.008264, over 14700.00 frames. ], tot_loss[loss=0.06662, simple_loss=0.09033, pruned_loss=0.01244, audio_tagging_loss=0.009016, over 3067242.31 frames. ], batch size: 54, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:15:54,316 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3108686.6666666665, ans=0.0 2023-11-25 23:16:19,932 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3108886.6666666665, ans=0.0 2023-11-25 23:16:20,625 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-25 23:16:35,619 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 466350 2023-11-25 23:16:41,286 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 9450, loss[loss=0.07617, simple_loss=0.103, pruned_loss=0.01526, audio_tagging_loss=0.009418, over 15392.00 frames. ], tot_loss[loss=0.06645, simple_loss=0.08981, pruned_loss=0.01244, audio_tagging_loss=0.009106, over 3061075.03 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:16:41,549 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3109020.0, ans=0.125 2023-11-25 23:16:42,349 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 23:17:25,982 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.965e+01 8.505e+01 9.184e+01 9.882e+01 1.417e+02, threshold=1.837e+02, percent-clipped=0.0 2023-11-25 23:17:30,179 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 466400 2023-11-25 23:17:35,700 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 9500, loss[loss=0.06455, simple_loss=0.08515, pruned_loss=0.01254, audio_tagging_loss=0.009434, over 14900.00 frames. ], tot_loss[loss=0.06678, simple_loss=0.09043, pruned_loss=0.01245, audio_tagging_loss=0.009116, over 3057752.01 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:17:41,154 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3109353.3333333335, ans=0.125 2023-11-25 23:17:49,948 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3109420.0, ans=0.125 2023-11-25 23:17:54,236 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3109420.0, ans=0.0 2023-11-25 23:18:24,976 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 466450 2023-11-25 23:18:29,876 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3109686.6666666665, ans=0.125 2023-11-25 23:18:30,751 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 9550, loss[loss=0.08682, simple_loss=0.1219, pruned_loss=0.01921, audio_tagging_loss=0.006673, over 15307.00 frames. ], tot_loss[loss=0.06706, simple_loss=0.09084, pruned_loss=0.01247, audio_tagging_loss=0.009164, over 3059057.13 frames. ], batch size: 54, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:18:37,466 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.87 vs. limit=22.5 2023-11-25 23:18:38,248 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3109686.6666666665, ans=0.125 2023-11-25 23:18:45,089 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3109753.3333333335, ans=0.125 2023-11-25 23:18:53,661 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.41 vs. limit=22.5 2023-11-25 23:18:56,727 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.07 vs. limit=12.0 2023-11-25 23:19:00,805 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3109820.0, ans=0.04949747468305833 2023-11-25 23:19:04,932 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3109886.6666666665, ans=0.125 2023-11-25 23:19:16,192 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.243e+01 8.693e+01 9.287e+01 1.001e+02 1.223e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-25 23:19:20,394 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 466500 2023-11-25 23:19:21,501 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3109953.3333333335, ans=0.0 2023-11-25 23:19:22,673 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3109953.3333333335, ans=0.0 2023-11-25 23:19:26,133 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 9600, loss[loss=0.04256, simple_loss=0.05518, pruned_loss=0.005621, audio_tagging_loss=0.009349, over 14109.00 frames. ], tot_loss[loss=0.06689, simple_loss=0.09068, pruned_loss=0.01235, audio_tagging_loss=0.009195, over 3053028.51 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 23:19:27,430 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3110020.0, ans=0.2 2023-11-25 23:19:51,386 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3110153.3333333335, ans=0.05 2023-11-25 23:19:53,673 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.57 vs. limit=15.0 2023-11-25 23:19:54,420 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3110153.3333333335, ans=0.125 2023-11-25 23:20:12,811 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3110286.6666666665, ans=0.2 2023-11-25 23:20:13,884 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3110286.6666666665, ans=0.0 2023-11-25 23:20:14,848 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 466550 2023-11-25 23:20:15,032 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3110286.6666666665, ans=0.125 2023-11-25 23:20:19,096 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3110353.3333333335, ans=0.1 2023-11-25 23:20:20,017 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 9650, loss[loss=0.06214, simple_loss=0.08404, pruned_loss=0.01171, audio_tagging_loss=0.008406, over 15647.00 frames. ], tot_loss[loss=0.06695, simple_loss=0.09057, pruned_loss=0.0125, audio_tagging_loss=0.009171, over 3050080.08 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 23:20:21,331 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3110353.3333333335, ans=0.125 2023-11-25 23:20:29,173 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=3110353.3333333335, ans=15.0 2023-11-25 23:20:36,560 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.whiten.whitening_limit, batch_count=3110420.0, ans=15.0 2023-11-25 23:21:05,143 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.725e+01 8.886e+01 9.411e+01 1.006e+02 1.308e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-25 23:21:09,293 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 466600 2023-11-25 23:21:14,687 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 9700, loss[loss=0.06723, simple_loss=0.09241, pruned_loss=0.0117, audio_tagging_loss=0.009323, over 14710.00 frames. ], tot_loss[loss=0.06718, simple_loss=0.09105, pruned_loss=0.01258, audio_tagging_loss=0.009079, over 3056822.03 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 23:21:53,697 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3110886.6666666665, ans=0.125 2023-11-25 23:21:55,661 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3110886.6666666665, ans=0.0 2023-11-25 23:22:04,878 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 466650 2023-11-25 23:22:07,710 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3110953.3333333335, ans=0.0 2023-11-25 23:22:09,767 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3111020.0, ans=0.125 2023-11-25 23:22:11,105 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 9750, loss[loss=0.05029, simple_loss=0.06706, pruned_loss=0.007045, audio_tagging_loss=0.00972, over 16327.00 frames. ], tot_loss[loss=0.06683, simple_loss=0.09067, pruned_loss=0.01249, audio_tagging_loss=0.009003, over 3053544.51 frames. ], batch size: 64, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:22:13,350 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3111020.0, ans=0.0 2023-11-25 23:22:17,647 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3111020.0, ans=0.0 2023-11-25 23:22:28,204 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3111086.6666666665, ans=0.04949747468305833 2023-11-25 23:22:34,590 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3111153.3333333335, ans=0.125 2023-11-25 23:22:35,570 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3111153.3333333335, ans=0.125 2023-11-25 23:22:44,913 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3111220.0, ans=0.125 2023-11-25 23:22:53,249 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3111220.0, ans=10.0 2023-11-25 23:22:57,246 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.121e+01 8.598e+01 9.280e+01 1.031e+02 1.262e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-25 23:23:00,458 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 466700 2023-11-25 23:23:05,716 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 9800, loss[loss=0.06447, simple_loss=0.0755, pruned_loss=0.01451, audio_tagging_loss=0.01221, over 15250.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.09016, pruned_loss=0.01265, audio_tagging_loss=0.008835, over 3049964.69 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:23:15,407 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3111420.0, ans=0.0 2023-11-25 23:23:17,869 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.23 vs. limit=22.5 2023-11-25 23:23:19,504 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3111420.0, ans=0.125 2023-11-25 23:23:28,059 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3111486.6666666665, ans=0.125 2023-11-25 23:23:32,883 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3111486.6666666665, ans=0.09899494936611666 2023-11-25 23:23:43,796 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3111553.3333333335, ans=0.0 2023-11-25 23:23:55,161 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 466750 2023-11-25 23:23:56,131 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 23:24:00,427 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 9850, loss[loss=0.09226, simple_loss=0.1334, pruned_loss=0.01938, audio_tagging_loss=0.006178, over 14891.00 frames. ], tot_loss[loss=0.06739, simple_loss=0.09138, pruned_loss=0.01291, audio_tagging_loss=0.008788, over 3050267.97 frames. ], batch size: 54, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:24:02,804 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3111686.6666666665, ans=0.2 2023-11-25 23:24:21,901 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3111820.0, ans=0.025 2023-11-25 23:24:22,514 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.54 vs. limit=15.0 2023-11-25 23:24:26,085 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3111820.0, ans=0.2 2023-11-25 23:24:29,326 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3111820.0, ans=0.09899494936611666 2023-11-25 23:24:30,366 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3111820.0, ans=0.0 2023-11-25 23:24:31,284 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3111820.0, ans=0.1 2023-11-25 23:24:34,544 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3111886.6666666665, ans=0.2 2023-11-25 23:24:39,162 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.54 vs. limit=15.0 2023-11-25 23:24:45,829 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.982e+01 8.652e+01 9.205e+01 1.019e+02 1.596e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-25 23:24:47,047 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3111953.3333333335, ans=0.0 2023-11-25 23:24:50,027 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 466800 2023-11-25 23:24:50,432 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.05 vs. limit=10.0 2023-11-25 23:24:55,451 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 9900, loss[loss=0.03927, simple_loss=0.04815, pruned_loss=0.00486, audio_tagging_loss=0.01034, over 15592.00 frames. ], tot_loss[loss=0.0679, simple_loss=0.09199, pruned_loss=0.01306, audio_tagging_loss=0.008843, over 3048416.26 frames. ], batch size: 61, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:24:55,931 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.97 vs. limit=15.0 2023-11-25 23:25:00,904 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3112020.0, ans=0.125 2023-11-25 23:25:01,921 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3112020.0, ans=0.0 2023-11-25 23:25:11,384 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3112086.6666666665, ans=0.125 2023-11-25 23:25:11,423 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3112086.6666666665, ans=0.0 2023-11-25 23:25:20,021 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.64 vs. limit=22.5 2023-11-25 23:25:21,962 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3112153.3333333335, ans=0.125 2023-11-25 23:25:32,358 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3112220.0, ans=0.2 2023-11-25 23:25:41,274 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3112286.6666666665, ans=0.125 2023-11-25 23:25:45,814 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 466850 2023-11-25 23:25:51,038 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 9950, loss[loss=0.05104, simple_loss=0.07344, pruned_loss=0.007338, audio_tagging_loss=0.006988, over 15283.00 frames. ], tot_loss[loss=0.06777, simple_loss=0.09177, pruned_loss=0.01303, audio_tagging_loss=0.008855, over 3061113.44 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:25:55,429 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3112353.3333333335, ans=0.1 2023-11-25 23:25:56,557 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3112353.3333333335, ans=0.125 2023-11-25 23:26:14,266 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3112486.6666666665, ans=0.125 2023-11-25 23:26:24,287 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3112553.3333333335, ans=0.125 2023-11-25 23:26:30,996 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3112553.3333333335, ans=0.125 2023-11-25 23:26:37,235 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.460e+01 8.531e+01 9.197e+01 9.885e+01 1.494e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-25 23:26:40,491 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 466900 2023-11-25 23:26:45,710 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 10000, loss[loss=0.08175, simple_loss=0.1175, pruned_loss=0.01652, audio_tagging_loss=0.006497, over 16181.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.08986, pruned_loss=0.01265, audio_tagging_loss=0.008899, over 3058200.76 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 23:26:54,713 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3112686.6666666665, ans=0.0 2023-11-25 23:27:28,747 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-25 23:27:34,940 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 466950 2023-11-25 23:27:41,129 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 10050, loss[loss=0.05656, simple_loss=0.08083, pruned_loss=0.007667, audio_tagging_loss=0.008476, over 14803.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.08993, pruned_loss=0.01258, audio_tagging_loss=0.008879, over 3049619.65 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 23:27:51,332 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3113086.6666666665, ans=0.05 2023-11-25 23:27:53,675 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3113086.6666666665, ans=0.125 2023-11-25 23:27:55,159 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.43 vs. limit=10.0 2023-11-25 23:28:07,541 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3113153.3333333335, ans=0.0 2023-11-25 23:28:28,025 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.248e+01 8.554e+01 9.112e+01 9.756e+01 1.275e+02, threshold=1.822e+02, percent-clipped=0.0 2023-11-25 23:28:30,658 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 467000 2023-11-25 23:28:36,525 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 10100, loss[loss=0.05936, simple_loss=0.07716, pruned_loss=0.009824, audio_tagging_loss=0.01096, over 14391.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.08903, pruned_loss=0.01252, audio_tagging_loss=0.009024, over 3047172.85 frames. ], batch size: 53, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:28:37,860 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-25 23:28:47,133 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3113420.0, ans=0.2 2023-11-25 23:28:51,447 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.33 vs. limit=15.0 2023-11-25 23:28:53,479 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3113420.0, ans=0.125 2023-11-25 23:29:22,922 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 23:29:26,110 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 467050 2023-11-25 23:29:31,201 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 10150, loss[loss=0.0538, simple_loss=0.0662, pruned_loss=0.009103, audio_tagging_loss=0.0116, over 14116.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.08853, pruned_loss=0.01239, audio_tagging_loss=0.009135, over 3042611.83 frames. ], batch size: 54, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:29:33,947 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.53 vs. limit=15.0 2023-11-25 23:29:43,381 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3113753.3333333335, ans=0.0 2023-11-25 23:29:55,353 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.49 vs. limit=15.0 2023-11-25 23:29:58,938 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 23:29:59,284 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.90 vs. limit=15.0 2023-11-25 23:30:06,805 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3113886.6666666665, ans=0.125 2023-11-25 23:30:11,046 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3113886.6666666665, ans=0.1 2023-11-25 23:30:18,231 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.487e+01 8.705e+01 9.387e+01 9.994e+01 1.374e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-25 23:30:20,435 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 467100 2023-11-25 23:30:21,708 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3113953.3333333335, ans=0.125 2023-11-25 23:30:25,367 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3114020.0, ans=0.125 2023-11-25 23:30:26,742 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 10200, loss[loss=0.05845, simple_loss=0.08129, pruned_loss=0.01132, audio_tagging_loss=0.006482, over 14941.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.08886, pruned_loss=0.01247, audio_tagging_loss=0.009178, over 3049351.73 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:30:39,835 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3114086.6666666665, ans=0.125 2023-11-25 23:30:41,990 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3114086.6666666665, ans=0.125 2023-11-25 23:30:42,035 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3114086.6666666665, ans=0.125 2023-11-25 23:30:45,507 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3114086.6666666665, ans=0.1 2023-11-25 23:30:49,590 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 23:30:52,952 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3114153.3333333335, ans=0.07 2023-11-25 23:30:57,047 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3114153.3333333335, ans=0.0 2023-11-25 23:31:12,954 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3114286.6666666665, ans=0.125 2023-11-25 23:31:16,502 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 467150 2023-11-25 23:31:21,707 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 10250, loss[loss=0.0607, simple_loss=0.08939, pruned_loss=0.009071, audio_tagging_loss=0.006932, over 15191.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.08953, pruned_loss=0.01249, audio_tagging_loss=0.00926, over 3050917.95 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:31:33,936 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3114420.0, ans=0.0 2023-11-25 23:31:51,617 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3114486.6666666665, ans=0.0 2023-11-25 23:32:02,820 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3114553.3333333335, ans=0.125 2023-11-25 23:32:04,033 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.66 vs. limit=6.0 2023-11-25 23:32:08,829 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.548e+01 8.876e+01 9.394e+01 1.009e+02 1.335e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-25 23:32:11,586 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 467200 2023-11-25 23:32:17,023 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 10300, loss[loss=0.05489, simple_loss=0.07303, pruned_loss=0.007785, audio_tagging_loss=0.0106, over 15294.00 frames. ], tot_loss[loss=0.06657, simple_loss=0.08969, pruned_loss=0.01246, audio_tagging_loss=0.009272, over 3050355.79 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:32:28,842 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=3114753.3333333335, ans=15.0 2023-11-25 23:32:31,687 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.98 vs. limit=12.0 2023-11-25 23:32:32,764 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3114753.3333333335, ans=0.125 2023-11-25 23:32:40,661 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3114820.0, ans=0.125 2023-11-25 23:32:43,710 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3114820.0, ans=0.0 2023-11-25 23:32:54,675 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3114886.6666666665, ans=0.0 2023-11-25 23:32:59,216 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.33 vs. limit=15.0 2023-11-25 23:33:06,210 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 467250 2023-11-25 23:33:08,421 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3114953.3333333335, ans=0.07 2023-11-25 23:33:11,802 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 10350, loss[loss=0.06937, simple_loss=0.08595, pruned_loss=0.01625, audio_tagging_loss=0.01014, over 14702.00 frames. ], tot_loss[loss=0.06712, simple_loss=0.09023, pruned_loss=0.01267, audio_tagging_loss=0.009331, over 3049745.08 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:33:28,719 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3115086.6666666665, ans=0.0 2023-11-25 23:33:58,326 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3115286.6666666665, ans=0.125 2023-11-25 23:33:59,133 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.500e+01 8.695e+01 9.211e+01 9.915e+01 1.210e+02, threshold=1.842e+02, percent-clipped=0.0 2023-11-25 23:33:59,467 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3115286.6666666665, ans=0.1 2023-11-25 23:34:01,266 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 467300 2023-11-25 23:34:06,990 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 10400, loss[loss=0.05981, simple_loss=0.07948, pruned_loss=0.01249, audio_tagging_loss=0.007581, over 14413.00 frames. ], tot_loss[loss=0.0663, simple_loss=0.08911, pruned_loss=0.01237, audio_tagging_loss=0.009369, over 3048692.50 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 23:34:13,583 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3115353.3333333335, ans=0.125 2023-11-25 23:34:24,624 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3115420.0, ans=0.04949747468305833 2023-11-25 23:34:43,622 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-25 23:34:55,782 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3115620.0, ans=0.1 2023-11-25 23:34:56,642 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 467350 2023-11-25 23:34:59,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3115620.0, ans=0.125 2023-11-25 23:35:01,764 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 10450, loss[loss=0.07049, simple_loss=0.09212, pruned_loss=0.01639, audio_tagging_loss=0.008041, over 15654.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.08898, pruned_loss=0.01244, audio_tagging_loss=0.00932, over 3049245.89 frames. ], batch size: 60, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 23:35:04,616 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3115686.6666666665, ans=0.125 2023-11-25 23:35:04,920 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.89 vs. limit=15.0 2023-11-25 23:35:16,081 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3115753.3333333335, ans=0.1 2023-11-25 23:35:37,414 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3115886.6666666665, ans=0.015 2023-11-25 23:35:42,680 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3115886.6666666665, ans=0.125 2023-11-25 23:35:46,556 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.50 vs. limit=22.5 2023-11-25 23:35:49,277 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.174e+01 8.667e+01 9.396e+01 1.018e+02 1.785e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-25 23:35:51,408 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 467400 2023-11-25 23:35:56,808 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 10500, loss[loss=0.07229, simple_loss=0.09365, pruned_loss=0.01744, audio_tagging_loss=0.00803, over 14317.00 frames. ], tot_loss[loss=0.06589, simple_loss=0.08872, pruned_loss=0.01238, audio_tagging_loss=0.009146, over 3058901.82 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 23:36:01,383 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.40 vs. limit=15.0 2023-11-25 23:36:07,507 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3116086.6666666665, ans=0.1 2023-11-25 23:36:15,703 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.15 vs. limit=15.0 2023-11-25 23:36:16,291 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3116086.6666666665, ans=0.125 2023-11-25 23:36:16,324 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3116086.6666666665, ans=0.125 2023-11-25 23:36:18,334 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3116153.3333333335, ans=0.2 2023-11-25 23:36:19,447 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3116153.3333333335, ans=0.1 2023-11-25 23:36:46,892 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 467450 2023-11-25 23:36:52,574 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 10550, loss[loss=0.05227, simple_loss=0.06984, pruned_loss=0.008049, audio_tagging_loss=0.0093, over 15252.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08843, pruned_loss=0.01238, audio_tagging_loss=0.009066, over 3050059.06 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:37:03,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3116420.0, ans=0.125 2023-11-25 23:37:05,287 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3116420.0, ans=0.0 2023-11-25 23:37:09,368 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3116420.0, ans=0.0 2023-11-25 23:37:40,572 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.206e+01 8.690e+01 9.247e+01 9.972e+01 1.800e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-25 23:37:41,726 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 467500 2023-11-25 23:37:46,041 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3116686.6666666665, ans=0.125 2023-11-25 23:37:46,051 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3116686.6666666665, ans=0.125 2023-11-25 23:37:46,828 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 10600, loss[loss=0.05134, simple_loss=0.0586, pruned_loss=0.0113, audio_tagging_loss=0.01073, over 14208.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.0887, pruned_loss=0.01237, audio_tagging_loss=0.009025, over 3047611.03 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:37:50,282 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3116686.6666666665, ans=0.1 2023-11-25 23:37:53,332 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3116686.6666666665, ans=0.125 2023-11-25 23:38:11,114 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.37 vs. limit=15.0 2023-11-25 23:38:35,966 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 467550 2023-11-25 23:38:41,682 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 10650, loss[loss=0.05064, simple_loss=0.06682, pruned_loss=0.009356, audio_tagging_loss=0.007872, over 15378.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.08888, pruned_loss=0.01236, audio_tagging_loss=0.008867, over 3047488.81 frames. ], batch size: 62, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:38:48,076 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3117020.0, ans=0.125 2023-11-25 23:38:52,544 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.11 vs. limit=15.0 2023-11-25 23:39:00,004 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3117086.6666666665, ans=0.1 2023-11-25 23:39:13,452 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3117220.0, ans=0.125 2023-11-25 23:39:22,201 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3117220.0, ans=0.125 2023-11-25 23:39:22,679 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.64 vs. limit=15.0 2023-11-25 23:39:26,376 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3117286.6666666665, ans=0.1 2023-11-25 23:39:30,337 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.082e+01 8.807e+01 9.255e+01 1.012e+02 1.355e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-25 23:39:31,439 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 467600 2023-11-25 23:39:36,825 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 10700, loss[loss=0.06643, simple_loss=0.1015, pruned_loss=0.009847, audio_tagging_loss=0.005844, over 14742.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.08856, pruned_loss=0.01225, audio_tagging_loss=0.00888, over 3038939.06 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:39:41,864 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3117353.3333333335, ans=0.1 2023-11-25 23:39:55,295 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3117420.0, ans=0.125 2023-11-25 23:39:57,300 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3117486.6666666665, ans=0.2 2023-11-25 23:39:59,541 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3117486.6666666665, ans=0.0 2023-11-25 23:40:07,160 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3117486.6666666665, ans=0.0 2023-11-25 23:40:16,515 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3117553.3333333335, ans=0.125 2023-11-25 23:40:24,189 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3117620.0, ans=0.1 2023-11-25 23:40:26,119 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 467650 2023-11-25 23:40:31,274 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 10750, loss[loss=0.0512, simple_loss=0.06232, pruned_loss=0.01007, audio_tagging_loss=0.009963, over 14024.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.08912, pruned_loss=0.01248, audio_tagging_loss=0.008805, over 3041219.72 frames. ], batch size: 54, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:40:38,691 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3117686.6666666665, ans=0.0 2023-11-25 23:41:12,019 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-25 23:41:14,218 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3117953.3333333335, ans=0.125 2023-11-25 23:41:18,688 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=3117953.3333333335, ans=10.0 2023-11-25 23:41:19,192 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.994e+01 8.803e+01 9.280e+01 9.939e+01 1.365e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-25 23:41:20,300 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 467700 2023-11-25 23:41:25,499 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 10800, loss[loss=0.05975, simple_loss=0.08196, pruned_loss=0.01004, audio_tagging_loss=0.008731, over 14215.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.08926, pruned_loss=0.01239, audio_tagging_loss=0.008806, over 3038963.02 frames. ], batch size: 54, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 23:41:31,404 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3118020.0, ans=0.1 2023-11-25 23:41:50,026 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.00 vs. limit=10.0 2023-11-25 23:42:08,317 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3118286.6666666665, ans=0.125 2023-11-25 23:42:14,467 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3118286.6666666665, ans=0.125 2023-11-25 23:42:15,816 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 467750 2023-11-25 23:42:20,983 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 10850, loss[loss=0.05924, simple_loss=0.06243, pruned_loss=0.01199, audio_tagging_loss=0.01603, over 15237.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.08894, pruned_loss=0.01237, audio_tagging_loss=0.008987, over 3043865.52 frames. ], batch size: 61, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:42:26,283 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.77 vs. limit=15.0 2023-11-25 23:42:28,015 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3118353.3333333335, ans=0.125 2023-11-25 23:42:38,493 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3118420.0, ans=0.125 2023-11-25 23:43:01,300 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3118553.3333333335, ans=0.125 2023-11-25 23:43:09,849 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.511e+01 8.755e+01 9.381e+01 1.019e+02 1.994e+02, threshold=1.876e+02, percent-clipped=1.0 2023-11-25 23:43:09,943 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 467800 2023-11-25 23:43:14,302 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 23:43:15,316 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 10900, loss[loss=0.06784, simple_loss=0.09222, pruned_loss=0.01205, audio_tagging_loss=0.009684, over 14417.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.08921, pruned_loss=0.01229, audio_tagging_loss=0.009017, over 3045380.57 frames. ], batch size: 53, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:43:42,012 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3118820.0, ans=0.1 2023-11-25 23:43:47,896 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3118886.6666666665, ans=0.04949747468305833 2023-11-25 23:43:50,949 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3118886.6666666665, ans=0.125 2023-11-25 23:44:04,184 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 467850 2023-11-25 23:44:09,318 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 10950, loss[loss=0.05659, simple_loss=0.07404, pruned_loss=0.00917, audio_tagging_loss=0.0104, over 14581.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08911, pruned_loss=0.01228, audio_tagging_loss=0.009021, over 3041233.54 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:44:14,696 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3119020.0, ans=0.2 2023-11-25 23:44:55,444 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3119286.6666666665, ans=0.125 2023-11-25 23:44:55,793 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.02 vs. limit=22.5 2023-11-25 23:44:58,213 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.075e+01 8.372e+01 9.128e+01 9.666e+01 1.249e+02, threshold=1.826e+02, percent-clipped=0.0 2023-11-25 23:44:58,305 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 467900 2023-11-25 23:45:04,513 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 11000, loss[loss=0.05604, simple_loss=0.07637, pruned_loss=0.008817, audio_tagging_loss=0.009036, over 16433.00 frames. ], tot_loss[loss=0.066, simple_loss=0.08918, pruned_loss=0.01238, audio_tagging_loss=0.009036, over 3048954.07 frames. ], batch size: 62, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:45:12,104 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3119353.3333333335, ans=0.125 2023-11-25 23:45:15,973 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 23:45:18,643 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.30 vs. limit=15.0 2023-11-25 23:45:23,431 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3119420.0, ans=0.125 2023-11-25 23:45:44,084 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3119553.3333333335, ans=0.0 2023-11-25 23:45:54,421 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 467950 2023-11-25 23:45:59,568 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 11050, loss[loss=0.0615, simple_loss=0.07815, pruned_loss=0.01008, audio_tagging_loss=0.01234, over 15346.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.08908, pruned_loss=0.01228, audio_tagging_loss=0.009204, over 3049910.94 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:46:00,891 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3119686.6666666665, ans=0.125 2023-11-25 23:46:03,950 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3119686.6666666665, ans=0.0 2023-11-25 23:46:04,383 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.41 vs. limit=12.0 2023-11-25 23:46:06,516 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.60 vs. limit=22.5 2023-11-25 23:46:07,375 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.94 vs. limit=22.5 2023-11-25 23:46:18,386 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3119753.3333333335, ans=0.0 2023-11-25 23:46:36,150 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3119886.6666666665, ans=0.0 2023-11-25 23:46:48,343 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.121e+01 8.692e+01 9.297e+01 1.029e+02 1.368e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-25 23:46:48,440 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 468000 2023-11-25 23:46:49,713 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-468000.pt 2023-11-25 23:46:55,484 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 11100, loss[loss=0.05541, simple_loss=0.06367, pruned_loss=0.01038, audio_tagging_loss=0.0132, over 15975.00 frames. ], tot_loss[loss=0.06653, simple_loss=0.08949, pruned_loss=0.01244, audio_tagging_loss=0.009344, over 3058512.71 frames. ], batch size: 62, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:47:26,631 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3120153.3333333335, ans=0.0 2023-11-25 23:47:41,807 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.60 vs. limit=15.0 2023-11-25 23:47:44,351 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 468050 2023-11-25 23:47:44,446 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3120286.6666666665, ans=0.0 2023-11-25 23:47:50,026 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 11150, loss[loss=0.06694, simple_loss=0.09216, pruned_loss=0.01181, audio_tagging_loss=0.009049, over 15304.00 frames. ], tot_loss[loss=0.06688, simple_loss=0.08986, pruned_loss=0.01253, audio_tagging_loss=0.009414, over 3057598.27 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:48:05,148 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.21 vs. limit=12.0 2023-11-25 23:48:22,276 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3120553.3333333335, ans=0.125 2023-11-25 23:48:36,428 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3120620.0, ans=0.125 2023-11-25 23:48:38,760 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.101e+01 8.667e+01 9.262e+01 9.903e+01 1.395e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-25 23:48:38,854 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 468100 2023-11-25 23:48:43,929 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 11200, loss[loss=0.05519, simple_loss=0.07422, pruned_loss=0.01021, audio_tagging_loss=0.007876, over 16112.00 frames. ], tot_loss[loss=0.0663, simple_loss=0.08887, pruned_loss=0.01248, audio_tagging_loss=0.009394, over 3054349.95 frames. ], batch size: 62, lr: 1.72e-03, grad_scale: 32.0 2023-11-25 23:48:55,593 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3120753.3333333335, ans=0.125 2023-11-25 23:49:02,951 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3120753.3333333335, ans=0.0 2023-11-25 23:49:04,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3120820.0, ans=0.125 2023-11-25 23:49:16,861 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3120886.6666666665, ans=0.1 2023-11-25 23:49:16,898 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3120886.6666666665, ans=0.2 2023-11-25 23:49:26,562 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3120953.3333333335, ans=0.125 2023-11-25 23:49:26,566 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3120953.3333333335, ans=0.125 2023-11-25 23:49:31,661 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 468150 2023-11-25 23:49:36,746 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 11250, loss[loss=0.05603, simple_loss=0.06556, pruned_loss=0.0128, audio_tagging_loss=0.01046, over 14672.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.089, pruned_loss=0.01269, audio_tagging_loss=0.009383, over 3053115.63 frames. ], batch size: 57, lr: 1.72e-03, grad_scale: 16.0 2023-11-25 23:49:37,306 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.83 vs. limit=15.0 2023-11-25 23:49:56,834 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.62 vs. limit=15.0 2023-11-25 23:50:04,157 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.46 vs. limit=12.0 2023-11-25 23:50:11,144 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3121220.0, ans=0.0 2023-11-25 23:50:19,444 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3121286.6666666665, ans=0.0 2023-11-25 23:50:25,543 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 468200 2023-11-25 23:50:26,480 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.350e+01 8.668e+01 9.346e+01 1.011e+02 2.547e+02, threshold=1.869e+02, percent-clipped=1.0 2023-11-25 23:50:28,406 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.17 vs. limit=15.0 2023-11-25 23:50:31,484 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 11300, loss[loss=0.06132, simple_loss=0.08661, pruned_loss=0.009016, audio_tagging_loss=0.008998, over 15897.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.08877, pruned_loss=0.01261, audio_tagging_loss=0.009316, over 3045133.09 frames. ], batch size: 60, lr: 1.72e-03, grad_scale: 16.0 2023-11-25 23:51:06,429 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.59 vs. limit=22.5 2023-11-25 23:51:09,226 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_na.min_abs, batch_count=3121553.3333333335, ans=0.02 2023-11-25 23:51:09,233 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3121553.3333333335, ans=0.125 2023-11-25 23:51:20,347 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 468250 2023-11-25 23:51:25,989 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 11350, loss[loss=0.05496, simple_loss=0.0782, pruned_loss=0.007178, audio_tagging_loss=0.008683, over 14438.00 frames. ], tot_loss[loss=0.06657, simple_loss=0.08936, pruned_loss=0.01279, audio_tagging_loss=0.009102, over 3047357.00 frames. ], batch size: 56, lr: 1.72e-03, grad_scale: 16.0 2023-11-25 23:51:32,878 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3121686.6666666665, ans=0.0 2023-11-25 23:51:35,968 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3121753.3333333335, ans=0.125 2023-11-25 23:51:47,360 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3121820.0, ans=0.125 2023-11-25 23:51:48,780 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.42 vs. limit=12.0 2023-11-25 23:52:01,175 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.62 vs. limit=22.5 2023-11-25 23:52:15,225 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 468300 2023-11-25 23:52:16,144 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.581e+01 8.715e+01 9.313e+01 1.012e+02 1.423e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-25 23:52:20,314 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 11400, loss[loss=0.07945, simple_loss=0.114, pruned_loss=0.0132, audio_tagging_loss=0.009235, over 15521.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.08912, pruned_loss=0.01261, audio_tagging_loss=0.009019, over 3046778.48 frames. ], batch size: 54, lr: 1.72e-03, grad_scale: 16.0 2023-11-25 23:52:53,980 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.87 vs. limit=15.0 2023-11-25 23:53:06,129 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3122286.6666666665, ans=0.2 2023-11-25 23:53:09,055 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 468350 2023-11-25 23:53:12,318 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-25 23:53:14,128 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 11450, loss[loss=0.07116, simple_loss=0.09153, pruned_loss=0.01644, audio_tagging_loss=0.008952, over 15274.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.089, pruned_loss=0.01254, audio_tagging_loss=0.008948, over 3046382.67 frames. ], batch size: 59, lr: 1.72e-03, grad_scale: 16.0 2023-11-25 23:53:19,435 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=3122353.3333333335, ans=0.025 2023-11-25 23:53:20,474 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3122353.3333333335, ans=0.1 2023-11-25 23:53:22,648 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3122353.3333333335, ans=0.2 2023-11-25 23:54:03,231 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 468400 2023-11-25 23:54:04,160 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.427e+01 8.283e+01 9.283e+01 1.005e+02 1.593e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-25 23:54:09,206 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 11500, loss[loss=0.06416, simple_loss=0.07923, pruned_loss=0.01438, audio_tagging_loss=0.01016, over 17038.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.08924, pruned_loss=0.01265, audio_tagging_loss=0.008892, over 3052152.26 frames. ], batch size: 63, lr: 1.72e-03, grad_scale: 16.0 2023-11-25 23:54:26,494 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3122753.3333333335, ans=0.0 2023-11-25 23:54:27,906 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.64 vs. limit=6.0 2023-11-25 23:54:49,354 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3122886.6666666665, ans=0.2 2023-11-25 23:54:57,826 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 468450 2023-11-25 23:55:02,666 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3123020.0, ans=0.0 2023-11-25 23:55:03,506 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 11550, loss[loss=0.06032, simple_loss=0.06932, pruned_loss=0.01198, audio_tagging_loss=0.01369, over 15680.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.08888, pruned_loss=0.01258, audio_tagging_loss=0.008925, over 3053845.66 frames. ], batch size: 61, lr: 1.72e-03, grad_scale: 16.0 2023-11-25 23:55:25,474 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3123153.3333333335, ans=0.125 2023-11-25 23:55:28,648 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3123153.3333333335, ans=0.0 2023-11-25 23:55:30,713 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3123153.3333333335, ans=0.2 2023-11-25 23:55:33,088 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=3123153.3333333335, ans=0.025 2023-11-25 23:55:38,152 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 23:55:52,197 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 468500 2023-11-25 23:55:53,115 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.591e+01 8.807e+01 9.353e+01 9.870e+01 1.294e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-25 23:55:55,527 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3123286.6666666665, ans=0.2 2023-11-25 23:55:57,372 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 11600, loss[loss=0.07026, simple_loss=0.1025, pruned_loss=0.01173, audio_tagging_loss=0.007278, over 16888.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.08923, pruned_loss=0.01243, audio_tagging_loss=0.008947, over 3050957.89 frames. ], batch size: 64, lr: 1.72e-03, grad_scale: 32.0 2023-11-25 23:56:04,800 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3123353.3333333335, ans=0.125 2023-11-25 23:56:15,759 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3123420.0, ans=0.0 2023-11-25 23:56:20,871 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3123486.6666666665, ans=0.025 2023-11-25 23:56:20,955 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3123486.6666666665, ans=0.125 2023-11-25 23:56:44,736 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3123620.0, ans=0.125 2023-11-25 23:56:47,260 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 468550 2023-11-25 23:56:49,505 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3123620.0, ans=0.0 2023-11-25 23:56:50,938 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.16 vs. limit=15.0 2023-11-25 23:56:52,395 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 11650, loss[loss=0.08682, simple_loss=0.1146, pruned_loss=0.01922, audio_tagging_loss=0.0103, over 14459.00 frames. ], tot_loss[loss=0.06673, simple_loss=0.09032, pruned_loss=0.01262, audio_tagging_loss=0.008942, over 3046424.28 frames. ], batch size: 53, lr: 1.72e-03, grad_scale: 16.0 2023-11-25 23:56:55,331 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3123686.6666666665, ans=0.2 2023-11-25 23:57:02,646 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3123753.3333333335, ans=0.0 2023-11-25 23:57:08,546 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.44 vs. limit=15.0 2023-11-25 23:57:12,382 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3123753.3333333335, ans=0.125 2023-11-25 23:57:41,984 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 468600 2023-11-25 23:57:44,265 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.340e+01 8.612e+01 9.119e+01 9.760e+01 1.208e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-25 23:57:47,434 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 11700, loss[loss=0.08904, simple_loss=0.1238, pruned_loss=0.01747, audio_tagging_loss=0.009677, over 13959.00 frames. ], tot_loss[loss=0.06663, simple_loss=0.09032, pruned_loss=0.01251, audio_tagging_loss=0.008961, over 3035634.26 frames. ], batch size: 52, lr: 1.72e-03, grad_scale: 16.0 2023-11-25 23:57:57,530 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3124086.6666666665, ans=0.0 2023-11-25 23:58:36,902 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 468650 2023-11-25 23:58:42,048 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 11750, loss[loss=0.06458, simple_loss=0.09561, pruned_loss=0.01165, audio_tagging_loss=0.005122, over 15592.00 frames. ], tot_loss[loss=0.06704, simple_loss=0.0906, pruned_loss=0.01271, audio_tagging_loss=0.009023, over 3043013.28 frames. ], batch size: 58, lr: 1.72e-03, grad_scale: 16.0 2023-11-25 23:58:47,524 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3124353.3333333335, ans=0.125 2023-11-25 23:58:57,093 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.23 vs. limit=15.0 2023-11-25 23:59:13,003 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3124486.6666666665, ans=0.2 2023-11-25 23:59:27,416 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.35 vs. limit=12.0 2023-11-25 23:59:32,137 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 468700 2023-11-25 23:59:34,135 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.396e+01 8.692e+01 9.354e+01 9.925e+01 1.548e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-25 23:59:37,269 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 11800, loss[loss=0.0481, simple_loss=0.06213, pruned_loss=0.007666, audio_tagging_loss=0.009374, over 15463.00 frames. ], tot_loss[loss=0.06681, simple_loss=0.09019, pruned_loss=0.01269, audio_tagging_loss=0.00902, over 3046219.01 frames. ], batch size: 61, lr: 1.72e-03, grad_scale: 16.0 2023-11-25 23:59:54,505 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3124753.3333333335, ans=0.125 2023-11-25 23:59:54,635 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 00:00:16,473 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3124886.6666666665, ans=0.1 2023-11-26 00:00:26,142 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 468750 2023-11-26 00:00:31,223 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 11850, loss[loss=0.05869, simple_loss=0.0739, pruned_loss=0.009455, audio_tagging_loss=0.01229, over 14817.00 frames. ], tot_loss[loss=0.0669, simple_loss=0.09026, pruned_loss=0.01264, audio_tagging_loss=0.009126, over 3044220.24 frames. ], batch size: 55, lr: 1.72e-03, grad_scale: 16.0 2023-11-26 00:00:34,708 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3125020.0, ans=0.125 2023-11-26 00:00:37,725 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3125020.0, ans=0.125 2023-11-26 00:01:05,539 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3125220.0, ans=0.0 2023-11-26 00:01:12,813 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3125220.0, ans=0.125 2023-11-26 00:01:12,833 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3125220.0, ans=0.125 2023-11-26 00:01:20,117 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 468800 2023-11-26 00:01:22,400 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.466e+01 8.741e+01 9.224e+01 1.012e+02 1.182e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-26 00:01:25,619 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 11900, loss[loss=0.08326, simple_loss=0.1142, pruned_loss=0.01883, audio_tagging_loss=0.007302, over 15017.00 frames. ], tot_loss[loss=0.06704, simple_loss=0.0904, pruned_loss=0.01262, audio_tagging_loss=0.009213, over 3051099.03 frames. ], batch size: 54, lr: 1.72e-03, grad_scale: 16.0 2023-11-26 00:01:27,794 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.52 vs. limit=15.0 2023-11-26 00:01:29,524 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3125353.3333333335, ans=0.125 2023-11-26 00:01:51,504 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3125486.6666666665, ans=0.0 2023-11-26 00:02:15,004 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 468850 2023-11-26 00:02:15,125 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3125620.0, ans=0.0 2023-11-26 00:02:20,694 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 11950, loss[loss=0.04721, simple_loss=0.05966, pruned_loss=0.008054, audio_tagging_loss=0.009323, over 14219.00 frames. ], tot_loss[loss=0.06715, simple_loss=0.09071, pruned_loss=0.01267, audio_tagging_loss=0.009129, over 3047497.30 frames. ], batch size: 54, lr: 1.72e-03, grad_scale: 16.0 2023-11-26 00:02:22,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3125686.6666666665, ans=0.125 2023-11-26 00:02:29,864 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3125686.6666666665, ans=0.2 2023-11-26 00:02:33,095 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3125753.3333333335, ans=0.1 2023-11-26 00:02:42,505 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3125820.0, ans=0.125 2023-11-26 00:02:43,417 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3125820.0, ans=0.125 2023-11-26 00:02:43,742 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.48 vs. limit=22.5 2023-11-26 00:02:55,311 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3125886.6666666665, ans=0.125 2023-11-26 00:03:09,047 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 468900 2023-11-26 00:03:11,571 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.232e+01 8.661e+01 9.250e+01 9.933e+01 1.391e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-26 00:03:12,820 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3125953.3333333335, ans=0.125 2023-11-26 00:03:14,639 INFO [train_asr.py:1235] (0/4) Epoch 39, batch 12000, loss[loss=0.059, simple_loss=0.07798, pruned_loss=0.009709, audio_tagging_loss=0.0103, over 14387.00 frames. ], tot_loss[loss=0.06733, simple_loss=0.09055, pruned_loss=0.01275, audio_tagging_loss=0.009307, over 3046554.88 frames. ], batch size: 55, lr: 1.72e-03, grad_scale: 32.0 2023-11-26 00:03:14,641 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-26 00:03:25,876 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([6.0908, 5.7683, 5.4963, 5.5665], device='cuda:0') 2023-11-26 00:03:28,399 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([4.4835, 3.9721, 3.6758, 4.0983, 3.8687, 3.9687, 4.0799, 3.6070], device='cuda:0') 2023-11-26 00:03:47,129 INFO [train_asr.py:1267] (0/4) Epoch 39, validation: loss=0.05809, simple_loss=0.05065, pruned_loss=0.005132, audio_tagging_loss=0.02764, over 4681554.00 frames. 2023-11-26 00:03:47,130 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-26 00:03:49,347 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3126020.0, ans=0.07 2023-11-26 00:04:14,286 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/epoch-39.pt 2023-11-26 00:04:40,565 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 0, loss[loss=0.08285, simple_loss=0.08926, pruned_loss=0.01467, audio_tagging_loss=0.02355, over 15656.00 frames. ], tot_loss[loss=0.08285, simple_loss=0.08926, pruned_loss=0.01467, audio_tagging_loss=0.02355, over 15656.00 frames. ], batch size: 58, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:04:40,567 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-26 00:05:12,149 INFO [train_asr.py:1267] (0/4) Epoch 40, validation: loss=0.05782, simple_loss=0.05064, pruned_loss=0.005121, audio_tagging_loss=0.02738, over 4681554.00 frames. 2023-11-26 00:05:12,150 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-26 00:05:18,819 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3126186.6666666665, ans=0.125 2023-11-26 00:05:20,090 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.20 vs. limit=15.0 2023-11-26 00:05:22,078 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3126253.3333333335, ans=0.0 2023-11-26 00:05:23,107 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_na.min_abs, batch_count=3126253.3333333335, ans=0.02 2023-11-26 00:05:27,817 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3126253.3333333335, ans=0.125 2023-11-26 00:05:34,112 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 468950 2023-11-26 00:06:04,183 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3126453.3333333335, ans=0.2 2023-11-26 00:06:07,102 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 50, loss[loss=0.07675, simple_loss=0.09895, pruned_loss=0.01358, audio_tagging_loss=0.01369, over 15437.00 frames. ], tot_loss[loss=0.07496, simple_loss=0.08844, pruned_loss=0.0126, audio_tagging_loss=0.01815, over 687234.47 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:06:21,772 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff2.min_abs, batch_count=3126586.6666666665, ans=0.1 2023-11-26 00:06:22,877 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3126586.6666666665, ans=0.025 2023-11-26 00:06:28,965 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 469000 2023-11-26 00:06:32,407 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.568e+01 9.207e+01 9.971e+01 1.067e+02 1.313e+02, threshold=1.994e+02, percent-clipped=0.0 2023-11-26 00:06:55,539 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3126786.6666666665, ans=0.0 2023-11-26 00:07:02,685 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 100, loss[loss=0.06656, simple_loss=0.07078, pruned_loss=0.01369, audio_tagging_loss=0.01748, over 15600.00 frames. ], tot_loss[loss=0.07315, simple_loss=0.08745, pruned_loss=0.01218, audio_tagging_loss=0.01726, over 1206855.60 frames. ], batch size: 61, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:07:03,950 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3126853.3333333335, ans=0.0 2023-11-26 00:07:14,530 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3126920.0, ans=0.125 2023-11-26 00:07:25,678 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 469050 2023-11-26 00:07:29,974 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3126986.6666666665, ans=0.2 2023-11-26 00:07:35,954 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3127053.3333333335, ans=0.125 2023-11-26 00:07:39,715 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.57 vs. limit=5.0 2023-11-26 00:07:42,154 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3127053.3333333335, ans=0.0 2023-11-26 00:07:48,607 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3127120.0, ans=0.125 2023-11-26 00:07:52,860 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3127120.0, ans=0.125 2023-11-26 00:07:58,584 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 150, loss[loss=0.07982, simple_loss=0.1106, pruned_loss=0.0139, audio_tagging_loss=0.01064, over 15962.00 frames. ], tot_loss[loss=0.07218, simple_loss=0.0892, pruned_loss=0.0125, audio_tagging_loss=0.01507, over 1611404.05 frames. ], batch size: 60, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:08:09,971 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.53 vs. limit=22.5 2023-11-26 00:08:19,858 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3127253.3333333335, ans=0.0 2023-11-26 00:08:21,760 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 469100 2023-11-26 00:08:24,930 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.565e+01 9.020e+01 9.615e+01 1.041e+02 1.301e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-26 00:08:48,974 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3127453.3333333335, ans=0.0 2023-11-26 00:08:54,979 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 200, loss[loss=0.05883, simple_loss=0.08366, pruned_loss=0.008361, audio_tagging_loss=0.00864, over 15973.00 frames. ], tot_loss[loss=0.07179, simple_loss=0.09115, pruned_loss=0.01287, audio_tagging_loss=0.01335, over 1933514.54 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:08:55,189 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3127520.0, ans=0.05 2023-11-26 00:08:58,424 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3127520.0, ans=0.125 2023-11-26 00:09:03,432 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.19 vs. limit=6.0 2023-11-26 00:09:16,961 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 469150 2023-11-26 00:09:36,216 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3127720.0, ans=0.0 2023-11-26 00:09:49,435 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3127853.3333333335, ans=0.125 2023-11-26 00:09:50,288 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 250, loss[loss=0.06581, simple_loss=0.09476, pruned_loss=0.0122, audio_tagging_loss=0.006227, over 15796.00 frames. ], tot_loss[loss=0.0702, simple_loss=0.09027, pruned_loss=0.01284, audio_tagging_loss=0.01222, over 2177396.63 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 8.0 2023-11-26 00:09:51,375 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3127853.3333333335, ans=0.1 2023-11-26 00:10:04,471 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3127920.0, ans=0.125 2023-11-26 00:10:11,926 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3127986.6666666665, ans=0.0 2023-11-26 00:10:12,753 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 469200 2023-11-26 00:10:16,280 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.00 vs. limit=15.0 2023-11-26 00:10:17,837 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.652e+01 8.769e+01 9.325e+01 1.022e+02 1.435e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-26 00:10:30,379 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3128053.3333333335, ans=0.09899494936611666 2023-11-26 00:10:35,655 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3128120.0, ans=0.125 2023-11-26 00:10:36,710 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3128120.0, ans=0.0 2023-11-26 00:10:46,174 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 300, loss[loss=0.04613, simple_loss=0.05124, pruned_loss=0.006488, audio_tagging_loss=0.01403, over 13876.00 frames. ], tot_loss[loss=0.06991, simple_loss=0.09121, pruned_loss=0.01302, audio_tagging_loss=0.01128, over 2362197.55 frames. ], batch size: 54, lr: 1.70e-03, grad_scale: 8.0 2023-11-26 00:10:56,518 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.67 vs. limit=15.0 2023-11-26 00:11:09,519 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 469250 2023-11-26 00:11:25,858 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.48 vs. limit=6.0 2023-11-26 00:11:35,308 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3128453.3333333335, ans=0.2 2023-11-26 00:11:42,965 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 350, loss[loss=0.05724, simple_loss=0.07354, pruned_loss=0.01368, audio_tagging_loss=0.006783, over 14182.00 frames. ], tot_loss[loss=0.06978, simple_loss=0.09233, pruned_loss=0.01313, audio_tagging_loss=0.01048, over 2518138.34 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 8.0 2023-11-26 00:11:47,395 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3128520.0, ans=0.125 2023-11-26 00:11:54,309 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3128586.6666666665, ans=0.0 2023-11-26 00:11:55,340 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3128586.6666666665, ans=0.125 2023-11-26 00:11:57,899 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.84 vs. limit=22.5 2023-11-26 00:12:03,863 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3128653.3333333335, ans=0.0 2023-11-26 00:12:04,835 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 469300 2023-11-26 00:12:08,967 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.623e+01 8.708e+01 9.325e+01 9.980e+01 1.485e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-26 00:12:15,251 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3128720.0, ans=0.0 2023-11-26 00:12:16,327 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3128720.0, ans=0.125 2023-11-26 00:12:19,483 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3128720.0, ans=0.2 2023-11-26 00:12:38,374 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 400, loss[loss=0.06055, simple_loss=0.08854, pruned_loss=0.009923, audio_tagging_loss=0.006352, over 16261.00 frames. ], tot_loss[loss=0.06883, simple_loss=0.09152, pruned_loss=0.01292, audio_tagging_loss=0.01015, over 2640081.22 frames. ], batch size: 58, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:12:38,590 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3128853.3333333335, ans=0.2 2023-11-26 00:12:39,612 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3128853.3333333335, ans=0.0 2023-11-26 00:12:40,749 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3128853.3333333335, ans=0.0 2023-11-26 00:12:57,468 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.91 vs. limit=22.5 2023-11-26 00:13:00,060 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 469350 2023-11-26 00:13:32,807 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 450, loss[loss=0.08464, simple_loss=0.1069, pruned_loss=0.01925, audio_tagging_loss=0.01196, over 15240.00 frames. ], tot_loss[loss=0.06805, simple_loss=0.09074, pruned_loss=0.01273, audio_tagging_loss=0.009943, over 2730712.59 frames. ], batch size: 60, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:13:37,444 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3129186.6666666665, ans=0.125 2023-11-26 00:13:42,317 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.11 vs. limit=6.0 2023-11-26 00:13:47,113 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3129253.3333333335, ans=0.125 2023-11-26 00:13:50,643 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.12 vs. limit=15.0 2023-11-26 00:13:56,315 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 469400 2023-11-26 00:14:00,681 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.213e+01 8.743e+01 9.299e+01 9.864e+01 1.390e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-26 00:14:00,996 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3129320.0, ans=0.0 2023-11-26 00:14:02,154 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.11 vs. limit=15.0 2023-11-26 00:14:03,055 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3129320.0, ans=0.125 2023-11-26 00:14:04,182 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3129320.0, ans=0.125 2023-11-26 00:14:07,289 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3129386.6666666665, ans=0.0 2023-11-26 00:14:10,772 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.17 vs. limit=15.0 2023-11-26 00:14:11,518 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3129386.6666666665, ans=0.1 2023-11-26 00:14:28,205 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3129520.0, ans=0.125 2023-11-26 00:14:28,984 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 500, loss[loss=0.06408, simple_loss=0.08389, pruned_loss=0.01215, audio_tagging_loss=0.009988, over 15040.00 frames. ], tot_loss[loss=0.06775, simple_loss=0.09082, pruned_loss=0.01275, audio_tagging_loss=0.009597, over 2802880.03 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:14:34,042 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3129520.0, ans=0.125 2023-11-26 00:14:36,042 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3129520.0, ans=0.125 2023-11-26 00:14:37,583 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3129520.0, ans=0.0 2023-11-26 00:14:44,073 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3129586.6666666665, ans=0.0 2023-11-26 00:14:51,401 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 469450 2023-11-26 00:14:53,602 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3129653.3333333335, ans=10.0 2023-11-26 00:15:02,175 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3129720.0, ans=0.125 2023-11-26 00:15:14,181 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3129786.6666666665, ans=0.125 2023-11-26 00:15:24,644 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 550, loss[loss=0.07595, simple_loss=0.1069, pruned_loss=0.01642, audio_tagging_loss=0.006062, over 15083.00 frames. ], tot_loss[loss=0.0673, simple_loss=0.0903, pruned_loss=0.01268, audio_tagging_loss=0.009463, over 2860465.35 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:15:25,991 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3129853.3333333335, ans=0.2 2023-11-26 00:15:31,346 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3129853.3333333335, ans=0.0 2023-11-26 00:15:34,409 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3129920.0, ans=0.125 2023-11-26 00:15:38,675 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3129920.0, ans=0.0 2023-11-26 00:15:40,693 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3129920.0, ans=0.04949747468305833 2023-11-26 00:15:44,587 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.30 vs. limit=22.5 2023-11-26 00:15:46,710 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 469500 2023-11-26 00:15:50,842 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.933e+01 8.611e+01 9.176e+01 9.917e+01 4.186e+02, threshold=1.835e+02, percent-clipped=1.0 2023-11-26 00:15:54,256 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3129986.6666666665, ans=0.1 2023-11-26 00:15:59,069 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.38 vs. limit=15.0 2023-11-26 00:16:14,659 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3130120.0, ans=0.1 2023-11-26 00:16:19,919 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 600, loss[loss=0.05713, simple_loss=0.07028, pruned_loss=0.01097, audio_tagging_loss=0.01101, over 14943.00 frames. ], tot_loss[loss=0.06714, simple_loss=0.09033, pruned_loss=0.01269, audio_tagging_loss=0.009281, over 2897471.16 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:16:43,243 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 469550 2023-11-26 00:16:49,268 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3130320.0, ans=0.0 2023-11-26 00:16:55,553 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3130386.6666666665, ans=0.125 2023-11-26 00:17:03,943 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3130453.3333333335, ans=0.09899494936611666 2023-11-26 00:17:16,608 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 650, loss[loss=0.07884, simple_loss=0.1118, pruned_loss=0.01543, audio_tagging_loss=0.007492, over 15700.00 frames. ], tot_loss[loss=0.06742, simple_loss=0.0909, pruned_loss=0.01271, audio_tagging_loss=0.009263, over 2930891.09 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:17:19,016 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3130520.0, ans=0.1 2023-11-26 00:17:23,208 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3130520.0, ans=0.125 2023-11-26 00:17:29,217 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3130586.6666666665, ans=0.2 2023-11-26 00:17:39,108 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 469600 2023-11-26 00:17:43,458 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.523e+01 8.552e+01 9.119e+01 9.990e+01 1.151e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-26 00:18:12,545 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 700, loss[loss=0.06503, simple_loss=0.09261, pruned_loss=0.0109, audio_tagging_loss=0.007819, over 15609.00 frames. ], tot_loss[loss=0.06762, simple_loss=0.09139, pruned_loss=0.01267, audio_tagging_loss=0.009254, over 2966541.13 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:18:20,129 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.18 vs. limit=22.5 2023-11-26 00:18:34,305 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 469650 2023-11-26 00:18:48,738 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3131053.3333333335, ans=0.125 2023-11-26 00:18:51,467 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3131053.3333333335, ans=0.125 2023-11-26 00:19:02,782 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3131120.0, ans=0.05 2023-11-26 00:19:07,763 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 750, loss[loss=0.07635, simple_loss=0.1045, pruned_loss=0.01509, audio_tagging_loss=0.009015, over 14182.00 frames. ], tot_loss[loss=0.06773, simple_loss=0.09162, pruned_loss=0.01266, audio_tagging_loss=0.009265, over 2984121.58 frames. ], batch size: 53, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:19:12,230 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3131186.6666666665, ans=0.09899494936611666 2023-11-26 00:19:21,295 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3131253.3333333335, ans=0.125 2023-11-26 00:19:24,536 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3131253.3333333335, ans=0.125 2023-11-26 00:19:29,626 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 469700 2023-11-26 00:19:34,321 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.489e+01 8.495e+01 9.390e+01 9.960e+01 1.200e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-26 00:19:35,165 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.91 vs. limit=6.0 2023-11-26 00:19:44,791 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3131386.6666666665, ans=10.0 2023-11-26 00:19:45,971 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.15 vs. limit=6.0 2023-11-26 00:19:56,458 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3131453.3333333335, ans=0.0 2023-11-26 00:20:03,199 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 800, loss[loss=0.05318, simple_loss=0.06315, pruned_loss=0.008565, audio_tagging_loss=0.01304, over 14759.00 frames. ], tot_loss[loss=0.0672, simple_loss=0.09066, pruned_loss=0.01254, audio_tagging_loss=0.009327, over 2991629.86 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:20:25,562 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 469750 2023-11-26 00:20:29,063 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3131653.3333333335, ans=0.125 2023-11-26 00:20:38,765 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3131720.0, ans=0.0 2023-11-26 00:20:59,434 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 850, loss[loss=0.06275, simple_loss=0.08874, pruned_loss=0.009444, audio_tagging_loss=0.008937, over 15593.00 frames. ], tot_loss[loss=0.06741, simple_loss=0.09075, pruned_loss=0.01266, audio_tagging_loss=0.009378, over 2998728.09 frames. ], batch size: 60, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:21:09,720 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3131920.0, ans=0.125 2023-11-26 00:21:15,289 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.12 vs. limit=6.0 2023-11-26 00:21:21,151 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 469800 2023-11-26 00:21:26,640 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.048e+01 8.683e+01 9.047e+01 1.001e+02 1.303e+02, threshold=1.809e+02, percent-clipped=0.0 2023-11-26 00:21:33,294 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3132053.3333333335, ans=0.0 2023-11-26 00:21:34,336 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3132053.3333333335, ans=0.09899494936611666 2023-11-26 00:21:34,342 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3132053.3333333335, ans=0.125 2023-11-26 00:21:47,732 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3132120.0, ans=0.125 2023-11-26 00:21:55,551 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 900, loss[loss=0.09231, simple_loss=0.1319, pruned_loss=0.0185, audio_tagging_loss=0.007837, over 15392.00 frames. ], tot_loss[loss=0.06698, simple_loss=0.08989, pruned_loss=0.01257, audio_tagging_loss=0.00946, over 3008675.44 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:22:09,825 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3132253.3333333335, ans=0.2 2023-11-26 00:22:18,259 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 469850 2023-11-26 00:22:52,231 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 950, loss[loss=0.07639, simple_loss=0.1044, pruned_loss=0.01752, audio_tagging_loss=0.006653, over 15511.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.08959, pruned_loss=0.0125, audio_tagging_loss=0.009351, over 3016530.86 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:23:07,886 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3132586.6666666665, ans=0.0 2023-11-26 00:23:14,149 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 469900 2023-11-26 00:23:19,286 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.398e+01 8.462e+01 9.352e+01 1.021e+02 1.286e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-26 00:23:30,743 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3132720.0, ans=0.125 2023-11-26 00:23:33,418 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3132720.0, ans=0.0 2023-11-26 00:23:34,871 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3132720.0, ans=0.125 2023-11-26 00:23:47,646 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 1000, loss[loss=0.072, simple_loss=0.1017, pruned_loss=0.01608, audio_tagging_loss=0.005069, over 15027.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.08881, pruned_loss=0.01241, audio_tagging_loss=0.009208, over 3014347.39 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:24:03,817 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.92 vs. limit=15.0 2023-11-26 00:24:06,580 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.59 vs. limit=12.0 2023-11-26 00:24:10,094 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 469950 2023-11-26 00:24:12,219 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 00:24:19,996 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3132986.6666666665, ans=0.0 2023-11-26 00:24:23,151 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3133053.3333333335, ans=0.0 2023-11-26 00:24:34,487 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3133120.0, ans=0.1 2023-11-26 00:24:36,601 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 00:24:37,669 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3133120.0, ans=0.0 2023-11-26 00:24:43,927 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 1050, loss[loss=0.05748, simple_loss=0.07763, pruned_loss=0.007761, audio_tagging_loss=0.0109, over 14973.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.08877, pruned_loss=0.01241, audio_tagging_loss=0.009004, over 3015264.17 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:24:44,059 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3133186.6666666665, ans=0.0 2023-11-26 00:24:59,007 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3133253.3333333335, ans=0.0 2023-11-26 00:25:00,682 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3133253.3333333335, ans=0.125 2023-11-26 00:25:06,357 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.64 vs. limit=15.0 2023-11-26 00:25:06,967 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 470000 2023-11-26 00:25:12,413 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.119e+01 8.549e+01 9.309e+01 1.004e+02 1.287e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-26 00:25:12,751 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3133320.0, ans=0.0 2023-11-26 00:25:24,964 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3133386.6666666665, ans=0.1 2023-11-26 00:25:40,164 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 1100, loss[loss=0.05863, simple_loss=0.0781, pruned_loss=0.01024, audio_tagging_loss=0.009344, over 15246.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.08876, pruned_loss=0.01248, audio_tagging_loss=0.008942, over 3021647.32 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:25:40,314 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3133520.0, ans=0.125 2023-11-26 00:25:43,726 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.22 vs. limit=15.0 2023-11-26 00:25:44,372 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 00:25:44,517 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3133520.0, ans=0.125 2023-11-26 00:25:47,799 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3133520.0, ans=0.1 2023-11-26 00:25:49,758 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff2.min_abs, batch_count=3133520.0, ans=0.1 2023-11-26 00:25:52,065 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3133586.6666666665, ans=0.07 2023-11-26 00:26:02,874 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 470050 2023-11-26 00:26:05,778 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.85 vs. limit=15.0 2023-11-26 00:26:07,493 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3133653.3333333335, ans=0.0 2023-11-26 00:26:29,179 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3133786.6666666665, ans=0.0 2023-11-26 00:26:36,637 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 1150, loss[loss=0.0834, simple_loss=0.1099, pruned_loss=0.01902, audio_tagging_loss=0.009431, over 15653.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.0899, pruned_loss=0.0127, audio_tagging_loss=0.008912, over 3022742.78 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:26:38,889 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3133853.3333333335, ans=0.125 2023-11-26 00:26:41,815 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.55 vs. limit=10.0 2023-11-26 00:26:43,718 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3133853.3333333335, ans=0.0 2023-11-26 00:26:48,998 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3133920.0, ans=0.1 2023-11-26 00:26:58,329 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 470100 2023-11-26 00:27:04,013 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.254e+01 8.538e+01 9.163e+01 1.008e+02 1.257e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-26 00:27:08,561 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3133986.6666666665, ans=0.0 2023-11-26 00:27:18,329 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.78 vs. limit=10.0 2023-11-26 00:27:20,440 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.75 vs. limit=15.0 2023-11-26 00:27:28,221 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3134120.0, ans=0.0 2023-11-26 00:27:28,369 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3134120.0, ans=0.125 2023-11-26 00:27:32,274 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 1200, loss[loss=0.07054, simple_loss=0.09404, pruned_loss=0.01544, audio_tagging_loss=0.008077, over 15017.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.08972, pruned_loss=0.01284, audio_tagging_loss=0.00896, over 3020462.88 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:27:49,159 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3134253.3333333335, ans=0.125 2023-11-26 00:27:55,248 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 470150 2023-11-26 00:28:03,431 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.35 vs. limit=15.0 2023-11-26 00:28:19,443 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3134453.3333333335, ans=0.125 2023-11-26 00:28:27,690 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 1250, loss[loss=0.07032, simple_loss=0.09902, pruned_loss=0.01404, audio_tagging_loss=0.006766, over 14746.00 frames. ], tot_loss[loss=0.06657, simple_loss=0.08976, pruned_loss=0.0128, audio_tagging_loss=0.008894, over 3017985.01 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:28:27,996 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3134520.0, ans=0.0 2023-11-26 00:28:36,410 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3134520.0, ans=0.0 2023-11-26 00:28:49,662 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3134653.3333333335, ans=0.1 2023-11-26 00:28:50,653 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 470200 2023-11-26 00:28:50,872 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3134653.3333333335, ans=0.2 2023-11-26 00:28:55,322 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3134653.3333333335, ans=0.125 2023-11-26 00:28:56,136 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.365e+01 8.537e+01 9.082e+01 9.508e+01 1.462e+02, threshold=1.816e+02, percent-clipped=0.0 2023-11-26 00:29:04,120 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.94 vs. limit=6.0 2023-11-26 00:29:23,793 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 1300, loss[loss=0.06566, simple_loss=0.09444, pruned_loss=0.009955, audio_tagging_loss=0.008489, over 16670.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.08906, pruned_loss=0.01253, audio_tagging_loss=0.008848, over 3029087.02 frames. ], batch size: 63, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:29:43,563 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3134920.0, ans=0.0 2023-11-26 00:29:45,602 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 470250 2023-11-26 00:30:10,635 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3135120.0, ans=0.0 2023-11-26 00:30:19,467 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 1350, loss[loss=0.07348, simple_loss=0.1072, pruned_loss=0.0125, audio_tagging_loss=0.007356, over 15095.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.09028, pruned_loss=0.01272, audio_tagging_loss=0.008755, over 3033316.34 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:30:22,231 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.20 vs. limit=22.5 2023-11-26 00:30:29,539 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.41 vs. limit=22.5 2023-11-26 00:30:42,540 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 470300 2023-11-26 00:30:47,645 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.032e+01 8.371e+01 9.120e+01 9.741e+01 1.134e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-26 00:30:49,116 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.07 vs. limit=22.5 2023-11-26 00:31:00,812 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 00:31:14,772 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 1400, loss[loss=0.06315, simple_loss=0.09064, pruned_loss=0.008977, audio_tagging_loss=0.008847, over 16007.00 frames. ], tot_loss[loss=0.06692, simple_loss=0.09079, pruned_loss=0.01277, audio_tagging_loss=0.008755, over 3038657.63 frames. ], batch size: 58, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:31:20,963 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3135520.0, ans=0.0 2023-11-26 00:31:36,031 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3135586.6666666665, ans=10.0 2023-11-26 00:31:38,596 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 470350 2023-11-26 00:31:54,733 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3135720.0, ans=10.0 2023-11-26 00:31:59,115 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3135786.6666666665, ans=0.0 2023-11-26 00:32:04,952 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3135786.6666666665, ans=0.0 2023-11-26 00:32:10,248 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.04 vs. limit=22.5 2023-11-26 00:32:11,754 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 1450, loss[loss=0.06803, simple_loss=0.09353, pruned_loss=0.01124, audio_tagging_loss=0.01003, over 15774.00 frames. ], tot_loss[loss=0.06679, simple_loss=0.09052, pruned_loss=0.01264, audio_tagging_loss=0.008888, over 3045294.11 frames. ], batch size: 58, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:32:21,174 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3135853.3333333335, ans=0.125 2023-11-26 00:32:23,588 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.32 vs. limit=15.0 2023-11-26 00:32:27,590 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3135920.0, ans=0.125 2023-11-26 00:32:27,662 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3135920.0, ans=0.125 2023-11-26 00:32:29,665 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3135920.0, ans=0.125 2023-11-26 00:32:33,909 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 470400 2023-11-26 00:32:36,460 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3135986.6666666665, ans=0.2 2023-11-26 00:32:40,435 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.317e+01 8.703e+01 9.390e+01 1.022e+02 1.337e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-26 00:32:51,462 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3136053.3333333335, ans=0.0 2023-11-26 00:33:04,139 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=3136120.0, ans=0.02 2023-11-26 00:33:08,192 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 1500, loss[loss=0.05512, simple_loss=0.06831, pruned_loss=0.008871, audio_tagging_loss=0.01209, over 14324.00 frames. ], tot_loss[loss=0.06735, simple_loss=0.09081, pruned_loss=0.01292, audio_tagging_loss=0.009016, over 3042635.07 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:33:23,317 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3136253.3333333335, ans=0.125 2023-11-26 00:33:30,716 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 470450 2023-11-26 00:33:41,989 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3136386.6666666665, ans=0.125 2023-11-26 00:34:03,650 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 1550, loss[loss=0.06944, simple_loss=0.09124, pruned_loss=0.01355, audio_tagging_loss=0.01028, over 15628.00 frames. ], tot_loss[loss=0.06701, simple_loss=0.09001, pruned_loss=0.01277, audio_tagging_loss=0.009238, over 3043254.70 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:34:13,508 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3136520.0, ans=0.2 2023-11-26 00:34:25,961 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.65 vs. limit=15.0 2023-11-26 00:34:26,673 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 470500 2023-11-26 00:34:33,570 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.285e+01 8.668e+01 9.304e+01 9.957e+01 1.824e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-26 00:34:35,993 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3136653.3333333335, ans=0.125 2023-11-26 00:34:38,097 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3136720.0, ans=0.0 2023-11-26 00:34:38,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3136720.0, ans=0.125 2023-11-26 00:34:40,464 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.00 vs. limit=15.0 2023-11-26 00:34:48,560 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3136786.6666666665, ans=0.09899494936611666 2023-11-26 00:34:56,562 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3136786.6666666665, ans=0.125 2023-11-26 00:34:59,541 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 1600, loss[loss=0.05296, simple_loss=0.0713, pruned_loss=0.007743, audio_tagging_loss=0.009567, over 15068.00 frames. ], tot_loss[loss=0.0664, simple_loss=0.08912, pruned_loss=0.01252, audio_tagging_loss=0.009321, over 3039626.78 frames. ], batch size: 58, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:35:22,166 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 470550 2023-11-26 00:35:26,659 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3136986.6666666665, ans=0.0 2023-11-26 00:35:49,293 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3137120.0, ans=0.2 2023-11-26 00:35:55,998 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 1650, loss[loss=0.0551, simple_loss=0.07514, pruned_loss=0.009593, audio_tagging_loss=0.007934, over 15178.00 frames. ], tot_loss[loss=0.06679, simple_loss=0.08974, pruned_loss=0.01257, audio_tagging_loss=0.009353, over 3039081.79 frames. ], batch size: 58, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:35:57,276 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3137186.6666666665, ans=0.1 2023-11-26 00:36:01,451 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3137186.6666666665, ans=0.1 2023-11-26 00:36:03,526 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3137186.6666666665, ans=0.0 2023-11-26 00:36:06,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3137253.3333333335, ans=0.125 2023-11-26 00:36:07,998 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3137253.3333333335, ans=0.125 2023-11-26 00:36:17,245 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 470600 2023-11-26 00:36:24,355 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.265e+01 8.506e+01 9.125e+01 1.020e+02 1.203e+02, threshold=1.825e+02, percent-clipped=0.0 2023-11-26 00:36:26,633 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3137320.0, ans=0.0 2023-11-26 00:36:39,527 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3137453.3333333335, ans=0.2 2023-11-26 00:36:40,638 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3137453.3333333335, ans=0.125 2023-11-26 00:36:51,056 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 1700, loss[loss=0.05068, simple_loss=0.06993, pruned_loss=0.008189, audio_tagging_loss=0.007528, over 15835.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.0899, pruned_loss=0.0125, audio_tagging_loss=0.009402, over 3041649.95 frames. ], batch size: 61, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:36:54,531 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 00:36:55,988 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.71 vs. limit=15.0 2023-11-26 00:36:58,660 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3137520.0, ans=0.1 2023-11-26 00:37:04,478 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3137586.6666666665, ans=0.125 2023-11-26 00:37:08,861 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3137586.6666666665, ans=0.2 2023-11-26 00:37:12,081 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3137653.3333333335, ans=0.125 2023-11-26 00:37:13,043 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 470650 2023-11-26 00:37:17,054 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3137653.3333333335, ans=0.0 2023-11-26 00:37:21,395 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3137653.3333333335, ans=0.025 2023-11-26 00:37:21,474 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3137653.3333333335, ans=0.2 2023-11-26 00:37:24,602 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=3137720.0, ans=0.05 2023-11-26 00:37:26,693 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3137720.0, ans=0.125 2023-11-26 00:37:46,332 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 1750, loss[loss=0.06622, simple_loss=0.0928, pruned_loss=0.009422, audio_tagging_loss=0.0104, over 15774.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.08987, pruned_loss=0.01253, audio_tagging_loss=0.00922, over 3042611.44 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:37:49,918 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.53 vs. limit=15.0 2023-11-26 00:37:55,032 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3137853.3333333335, ans=0.1 2023-11-26 00:37:55,080 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3137853.3333333335, ans=0.1 2023-11-26 00:37:56,058 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3137920.0, ans=0.2 2023-11-26 00:37:57,708 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.56 vs. limit=8.0 2023-11-26 00:38:06,625 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3137920.0, ans=0.0 2023-11-26 00:38:08,753 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 470700 2023-11-26 00:38:09,485 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.77 vs. limit=15.0 2023-11-26 00:38:16,170 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.336e+01 8.541e+01 8.977e+01 9.696e+01 1.531e+02, threshold=1.795e+02, percent-clipped=0.0 2023-11-26 00:38:28,505 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.09 vs. limit=15.0 2023-11-26 00:38:31,681 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.38 vs. limit=15.0 2023-11-26 00:38:35,841 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.23 vs. limit=15.0 2023-11-26 00:38:42,298 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 1800, loss[loss=0.07462, simple_loss=0.1002, pruned_loss=0.0165, audio_tagging_loss=0.008044, over 14938.00 frames. ], tot_loss[loss=0.06644, simple_loss=0.08956, pruned_loss=0.0125, audio_tagging_loss=0.009155, over 3040545.11 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:39:00,434 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.75 vs. limit=15.0 2023-11-26 00:39:03,972 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 470750 2023-11-26 00:39:14,132 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3138386.6666666665, ans=0.125 2023-11-26 00:39:26,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3138453.3333333335, ans=0.0 2023-11-26 00:39:32,710 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.91 vs. limit=15.0 2023-11-26 00:39:33,390 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3138453.3333333335, ans=0.125 2023-11-26 00:39:37,374 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 1850, loss[loss=0.04651, simple_loss=0.0606, pruned_loss=0.006509, audio_tagging_loss=0.009701, over 15527.00 frames. ], tot_loss[loss=0.0664, simple_loss=0.08972, pruned_loss=0.0125, audio_tagging_loss=0.009032, over 3042444.42 frames. ], batch size: 58, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:39:40,788 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3138520.0, ans=0.07 2023-11-26 00:39:51,923 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3138586.6666666665, ans=0.0 2023-11-26 00:39:59,031 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 470800 2023-11-26 00:40:02,664 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3138653.3333333335, ans=0.125 2023-11-26 00:40:03,692 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3138653.3333333335, ans=0.0 2023-11-26 00:40:07,148 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.457e+01 8.736e+01 9.136e+01 9.723e+01 1.171e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-26 00:40:11,095 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3138720.0, ans=0.2 2023-11-26 00:40:12,143 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3138720.0, ans=0.2 2023-11-26 00:40:13,762 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.94 vs. limit=15.0 2023-11-26 00:40:19,632 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3138720.0, ans=0.0 2023-11-26 00:40:32,781 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 1900, loss[loss=0.05134, simple_loss=0.06778, pruned_loss=0.0121, audio_tagging_loss=0.005351, over 14035.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.08934, pruned_loss=0.01243, audio_tagging_loss=0.00895, over 3039751.98 frames. ], batch size: 54, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:40:33,093 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3138853.3333333335, ans=0.0 2023-11-26 00:40:55,387 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 470850 2023-11-26 00:41:02,054 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.31 vs. limit=10.0 2023-11-26 00:41:15,498 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 00:41:28,555 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 1950, loss[loss=0.07375, simple_loss=0.08731, pruned_loss=0.01693, audio_tagging_loss=0.01317, over 14879.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.0892, pruned_loss=0.01253, audio_tagging_loss=0.008975, over 3041834.00 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:41:48,526 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3139253.3333333335, ans=0.125 2023-11-26 00:41:50,581 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 470900 2023-11-26 00:41:59,048 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.531e+01 8.427e+01 9.159e+01 1.002e+02 1.233e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-26 00:41:59,295 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3139320.0, ans=0.2 2023-11-26 00:42:02,784 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.71 vs. limit=15.0 2023-11-26 00:42:04,566 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 00:42:05,534 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3139386.6666666665, ans=0.2 2023-11-26 00:42:24,506 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 2000, loss[loss=0.0614, simple_loss=0.07511, pruned_loss=0.01161, audio_tagging_loss=0.01224, over 15676.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.08945, pruned_loss=0.01256, audio_tagging_loss=0.008917, over 3043487.68 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:42:40,558 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3139586.6666666665, ans=0.2 2023-11-26 00:42:46,816 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 470950 2023-11-26 00:42:48,043 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3139653.3333333335, ans=0.0 2023-11-26 00:43:09,272 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3139786.6666666665, ans=0.04949747468305833 2023-11-26 00:43:11,219 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3139786.6666666665, ans=0.0 2023-11-26 00:43:14,452 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3139786.6666666665, ans=0.1 2023-11-26 00:43:19,545 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 2050, loss[loss=0.08953, simple_loss=0.1201, pruned_loss=0.01716, audio_tagging_loss=0.01234, over 14905.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.08977, pruned_loss=0.01259, audio_tagging_loss=0.008903, over 3039070.58 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:43:28,738 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3139853.3333333335, ans=0.0 2023-11-26 00:43:41,856 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 471000 2023-11-26 00:43:42,320 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.14 vs. limit=15.0 2023-11-26 00:43:50,089 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.223e+01 8.583e+01 9.206e+01 9.963e+01 1.276e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-26 00:43:55,656 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3140053.3333333335, ans=0.125 2023-11-26 00:44:06,207 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 00:44:08,211 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3140120.0, ans=0.0 2023-11-26 00:44:15,674 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 2100, loss[loss=0.0595, simple_loss=0.07903, pruned_loss=0.0115, audio_tagging_loss=0.008485, over 15084.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.08909, pruned_loss=0.01239, audio_tagging_loss=0.008853, over 3038988.17 frames. ], batch size: 58, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:44:15,923 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3140186.6666666665, ans=0.1 2023-11-26 00:44:22,864 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3140186.6666666665, ans=0.125 2023-11-26 00:44:25,087 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3140186.6666666665, ans=0.125 2023-11-26 00:44:30,413 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3140253.3333333335, ans=0.0 2023-11-26 00:44:38,000 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 471050 2023-11-26 00:44:39,298 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3140320.0, ans=0.0 2023-11-26 00:45:01,626 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3140453.3333333335, ans=0.05 2023-11-26 00:45:04,851 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3140453.3333333335, ans=0.2 2023-11-26 00:45:07,952 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3140453.3333333335, ans=0.2 2023-11-26 00:45:11,034 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 2150, loss[loss=0.05922, simple_loss=0.07633, pruned_loss=0.01058, audio_tagging_loss=0.01047, over 14527.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.08913, pruned_loss=0.01242, audio_tagging_loss=0.008835, over 3041365.39 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:45:24,805 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.28 vs. limit=12.0 2023-11-26 00:45:27,779 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3140586.6666666665, ans=0.125 2023-11-26 00:45:33,538 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 471100 2023-11-26 00:45:41,970 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.211e+01 8.773e+01 9.255e+01 9.995e+01 1.124e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-26 00:45:42,585 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.97 vs. limit=22.5 2023-11-26 00:45:45,890 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 00:45:54,704 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3140786.6666666665, ans=0.125 2023-11-26 00:45:59,506 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3140786.6666666665, ans=0.125 2023-11-26 00:46:06,709 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 2200, loss[loss=0.07372, simple_loss=0.1062, pruned_loss=0.01237, audio_tagging_loss=0.008251, over 15370.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.08967, pruned_loss=0.01244, audio_tagging_loss=0.008767, over 3047419.64 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:46:08,038 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3140853.3333333335, ans=10.0 2023-11-26 00:46:29,017 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 471150 2023-11-26 00:46:39,709 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3141053.3333333335, ans=0.0 2023-11-26 00:46:48,785 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3141053.3333333335, ans=0.1 2023-11-26 00:46:51,867 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3141120.0, ans=0.125 2023-11-26 00:46:53,434 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.30 vs. limit=15.0 2023-11-26 00:46:56,791 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3141120.0, ans=0.125 2023-11-26 00:46:59,861 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3141120.0, ans=0.125 2023-11-26 00:47:01,673 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 2250, loss[loss=0.07005, simple_loss=0.09603, pruned_loss=0.01216, audio_tagging_loss=0.009881, over 15227.00 frames. ], tot_loss[loss=0.06644, simple_loss=0.09026, pruned_loss=0.01252, audio_tagging_loss=0.008789, over 3040901.39 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:47:17,398 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3141253.3333333335, ans=0.025 2023-11-26 00:47:23,517 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 471200 2023-11-26 00:47:32,697 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.277e+01 8.619e+01 9.398e+01 1.009e+02 1.153e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-26 00:47:34,346 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.65 vs. limit=15.0 2023-11-26 00:47:46,598 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.72 vs. limit=15.0 2023-11-26 00:47:49,815 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.88 vs. limit=22.5 2023-11-26 00:47:52,216 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3141453.3333333335, ans=0.1 2023-11-26 00:47:56,771 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.44 vs. limit=22.5 2023-11-26 00:47:57,261 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 2300, loss[loss=0.08003, simple_loss=0.1105, pruned_loss=0.01715, audio_tagging_loss=0.00762, over 14960.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.08997, pruned_loss=0.01252, audio_tagging_loss=0.008863, over 3041582.97 frames. ], batch size: 53, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:48:09,687 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3141586.6666666665, ans=0.125 2023-11-26 00:48:19,124 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.31 vs. limit=22.5 2023-11-26 00:48:19,698 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 471250 2023-11-26 00:48:32,024 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3141720.0, ans=0.05 2023-11-26 00:48:38,341 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3141720.0, ans=0.1 2023-11-26 00:48:42,588 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3141786.6666666665, ans=0.125 2023-11-26 00:48:43,662 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3141786.6666666665, ans=0.125 2023-11-26 00:48:46,546 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 00:48:47,772 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3141786.6666666665, ans=0.0 2023-11-26 00:48:52,341 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 2350, loss[loss=0.07097, simple_loss=0.1021, pruned_loss=0.01284, audio_tagging_loss=0.007065, over 14658.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.08974, pruned_loss=0.01244, audio_tagging_loss=0.008927, over 3041255.91 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:48:54,673 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3141853.3333333335, ans=0.1 2023-11-26 00:49:06,913 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3141920.0, ans=0.0 2023-11-26 00:49:11,141 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3141920.0, ans=0.2 2023-11-26 00:49:14,614 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 471300 2023-11-26 00:49:21,219 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3141986.6666666665, ans=0.2 2023-11-26 00:49:23,015 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.222e+01 8.557e+01 9.249e+01 9.915e+01 1.418e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-26 00:49:27,643 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3142053.3333333335, ans=0.125 2023-11-26 00:49:30,837 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3142053.3333333335, ans=0.125 2023-11-26 00:49:44,085 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3142120.0, ans=0.0 2023-11-26 00:49:48,004 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 2400, loss[loss=0.07486, simple_loss=0.1006, pruned_loss=0.01394, audio_tagging_loss=0.01061, over 15542.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.0895, pruned_loss=0.01254, audio_tagging_loss=0.009032, over 3037774.93 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:49:56,179 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3142186.6666666665, ans=0.1 2023-11-26 00:50:09,810 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 471350 2023-11-26 00:50:31,113 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3142453.3333333335, ans=0.1 2023-11-26 00:50:41,144 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3142453.3333333335, ans=0.0 2023-11-26 00:50:43,038 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 2450, loss[loss=0.08739, simple_loss=0.1243, pruned_loss=0.01787, audio_tagging_loss=0.007358, over 15212.00 frames. ], tot_loss[loss=0.06662, simple_loss=0.08994, pruned_loss=0.01251, audio_tagging_loss=0.009146, over 3045439.86 frames. ], batch size: 58, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:50:46,811 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.79 vs. limit=22.5 2023-11-26 00:50:53,646 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3142586.6666666665, ans=0.0 2023-11-26 00:51:04,632 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 471400 2023-11-26 00:51:13,854 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.339e+01 8.694e+01 9.441e+01 1.025e+02 1.251e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-26 00:51:20,079 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3142720.0, ans=0.125 2023-11-26 00:51:29,506 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3142786.6666666665, ans=0.0 2023-11-26 00:51:33,758 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3142786.6666666665, ans=0.125 2023-11-26 00:51:35,887 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3142786.6666666665, ans=10.0 2023-11-26 00:51:37,662 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 2500, loss[loss=0.0372, simple_loss=0.04133, pruned_loss=0.006874, audio_tagging_loss=0.009664, over 13316.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.08894, pruned_loss=0.01226, audio_tagging_loss=0.009191, over 3037424.84 frames. ], batch size: 53, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:51:41,151 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3142853.3333333335, ans=0.125 2023-11-26 00:51:58,157 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3142920.0, ans=0.2 2023-11-26 00:51:58,223 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3142920.0, ans=0.125 2023-11-26 00:52:00,125 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 471450 2023-11-26 00:52:03,945 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3142986.6666666665, ans=0.0 2023-11-26 00:52:13,638 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.03 vs. limit=15.0 2023-11-26 00:52:33,324 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 2550, loss[loss=0.07077, simple_loss=0.1012, pruned_loss=0.01309, audio_tagging_loss=0.007099, over 14463.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.08872, pruned_loss=0.01228, audio_tagging_loss=0.009132, over 3030544.02 frames. ], batch size: 54, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:52:33,616 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3143186.6666666665, ans=0.0 2023-11-26 00:52:37,908 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3143186.6666666665, ans=0.125 2023-11-26 00:52:43,688 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.89 vs. limit=6.0 2023-11-26 00:52:54,903 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 471500 2023-11-26 00:53:03,215 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.431e+01 8.571e+01 9.048e+01 1.003e+02 1.375e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-26 00:53:14,078 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.96 vs. limit=15.0 2023-11-26 00:53:26,253 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.02 vs. limit=22.5 2023-11-26 00:53:27,801 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 2600, loss[loss=0.06492, simple_loss=0.09789, pruned_loss=0.008452, audio_tagging_loss=0.007528, over 16026.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.08938, pruned_loss=0.01242, audio_tagging_loss=0.008971, over 3034107.88 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:53:36,026 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.24 vs. limit=15.0 2023-11-26 00:53:39,733 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 00:53:49,057 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 471550 2023-11-26 00:54:16,364 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3143786.6666666665, ans=0.125 2023-11-26 00:54:17,947 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.50 vs. limit=15.0 2023-11-26 00:54:20,926 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.36 vs. limit=15.0 2023-11-26 00:54:22,453 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 2650, loss[loss=0.07835, simple_loss=0.1006, pruned_loss=0.01787, audio_tagging_loss=0.01018, over 15272.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.0902, pruned_loss=0.01265, audio_tagging_loss=0.008831, over 3034590.84 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:54:23,907 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.67 vs. limit=12.0 2023-11-26 00:54:44,943 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 471600 2023-11-26 00:54:54,181 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.039e+01 8.622e+01 9.253e+01 1.030e+02 1.251e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-26 00:55:07,410 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3144120.0, ans=0.125 2023-11-26 00:55:18,669 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 2700, loss[loss=0.05671, simple_loss=0.0702, pruned_loss=0.01118, audio_tagging_loss=0.01043, over 16483.00 frames. ], tot_loss[loss=0.06698, simple_loss=0.09083, pruned_loss=0.01282, audio_tagging_loss=0.008742, over 3037301.21 frames. ], batch size: 64, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:55:22,756 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3144186.6666666665, ans=0.1 2023-11-26 00:55:32,775 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3144253.3333333335, ans=0.1 2023-11-26 00:55:39,213 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3144253.3333333335, ans=0.2 2023-11-26 00:55:41,212 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 471650 2023-11-26 00:55:47,109 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.54 vs. limit=22.5 2023-11-26 00:56:00,539 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3144386.6666666665, ans=0.0 2023-11-26 00:56:15,111 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 2750, loss[loss=0.08279, simple_loss=0.1122, pruned_loss=0.01785, audio_tagging_loss=0.008812, over 14981.00 frames. ], tot_loss[loss=0.06674, simple_loss=0.09041, pruned_loss=0.01277, audio_tagging_loss=0.008765, over 3041172.46 frames. ], batch size: 53, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:56:28,371 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.92 vs. limit=12.0 2023-11-26 00:56:36,219 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 471700 2023-11-26 00:56:45,693 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.793e+01 8.564e+01 9.385e+01 1.025e+02 1.216e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-26 00:57:03,943 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 00:57:10,307 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 2800, loss[loss=0.0872, simple_loss=0.12, pruned_loss=0.02022, audio_tagging_loss=0.006971, over 16037.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.09013, pruned_loss=0.01268, audio_tagging_loss=0.00881, over 3039748.85 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:57:20,270 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3144920.0, ans=0.125 2023-11-26 00:57:26,478 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3144920.0, ans=10.0 2023-11-26 00:57:33,152 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 471750 2023-11-26 00:57:55,052 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3145120.0, ans=0.2 2023-11-26 00:58:05,874 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 2850, loss[loss=0.06522, simple_loss=0.08592, pruned_loss=0.01178, audio_tagging_loss=0.01049, over 15507.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.08921, pruned_loss=0.01255, audio_tagging_loss=0.008836, over 3039663.61 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:58:10,045 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.95 vs. limit=15.0 2023-11-26 00:58:10,811 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3145186.6666666665, ans=0.035 2023-11-26 00:58:14,135 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3145186.6666666665, ans=0.1 2023-11-26 00:58:28,841 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 471800 2023-11-26 00:58:36,587 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3145320.0, ans=0.125 2023-11-26 00:58:38,474 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.919e+01 8.895e+01 9.329e+01 9.789e+01 1.221e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-26 00:59:02,280 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 2900, loss[loss=0.07102, simple_loss=0.1019, pruned_loss=0.01213, audio_tagging_loss=0.00795, over 15948.00 frames. ], tot_loss[loss=0.06681, simple_loss=0.09065, pruned_loss=0.01269, audio_tagging_loss=0.008797, over 3036472.05 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:59:06,324 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3145520.0, ans=0.0 2023-11-26 00:59:06,553 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.02 vs. limit=22.5 2023-11-26 00:59:18,317 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3145586.6666666665, ans=0.125 2023-11-26 00:59:24,306 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 471850 2023-11-26 00:59:24,479 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3145653.3333333335, ans=0.125 2023-11-26 00:59:24,544 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3145653.3333333335, ans=0.125 2023-11-26 00:59:36,134 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3145720.0, ans=0.125 2023-11-26 00:59:39,540 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.61 vs. limit=15.0 2023-11-26 00:59:47,518 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3145786.6666666665, ans=0.0 2023-11-26 00:59:58,011 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 2950, loss[loss=0.08714, simple_loss=0.1177, pruned_loss=0.02009, audio_tagging_loss=0.008178, over 14277.00 frames. ], tot_loss[loss=0.06703, simple_loss=0.09061, pruned_loss=0.01287, audio_tagging_loss=0.008856, over 3038399.35 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 01:00:11,141 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.94 vs. limit=15.0 2023-11-26 01:00:20,332 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 471900 2023-11-26 01:00:22,971 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.41 vs. limit=22.5 2023-11-26 01:00:31,951 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.912e+01 8.856e+01 9.351e+01 9.999e+01 2.175e+02, threshold=1.870e+02, percent-clipped=2.0 2023-11-26 01:00:47,174 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3146120.0, ans=0.0 2023-11-26 01:00:50,891 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.20 vs. limit=22.5 2023-11-26 01:00:53,313 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 3000, loss[loss=0.07091, simple_loss=0.09653, pruned_loss=0.01344, audio_tagging_loss=0.009207, over 15732.00 frames. ], tot_loss[loss=0.06753, simple_loss=0.09145, pruned_loss=0.01301, audio_tagging_loss=0.008803, over 3044841.46 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 8.0 2023-11-26 01:00:53,315 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-26 01:01:25,512 INFO [train_asr.py:1267] (0/4) Epoch 40, validation: loss=0.05777, simple_loss=0.05069, pruned_loss=0.005189, audio_tagging_loss=0.02724, over 4681554.00 frames. 2023-11-26 01:01:25,513 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-26 01:01:27,779 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3146186.6666666665, ans=0.2 2023-11-26 01:01:46,799 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 471950 2023-11-26 01:02:03,315 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.77 vs. limit=10.0 2023-11-26 01:02:20,665 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 3050, loss[loss=0.05244, simple_loss=0.07532, pruned_loss=0.005544, audio_tagging_loss=0.009232, over 15759.00 frames. ], tot_loss[loss=0.06741, simple_loss=0.09122, pruned_loss=0.01287, audio_tagging_loss=0.008925, over 3050482.31 frames. ], batch size: 61, lr: 1.70e-03, grad_scale: 8.0 2023-11-26 01:02:20,921 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-26 01:02:33,960 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3146586.6666666665, ans=0.125 2023-11-26 01:02:38,757 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3146586.6666666665, ans=0.0 2023-11-26 01:02:42,825 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 472000 2023-11-26 01:02:44,178 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-472000.pt 2023-11-26 01:02:47,449 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3146653.3333333335, ans=0.125 2023-11-26 01:02:56,921 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.352e+01 8.712e+01 9.411e+01 1.021e+02 1.458e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-26 01:02:56,985 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 01:03:08,748 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3146786.6666666665, ans=0.125 2023-11-26 01:03:18,251 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 3100, loss[loss=0.07997, simple_loss=0.1186, pruned_loss=0.01515, audio_tagging_loss=0.005492, over 15231.00 frames. ], tot_loss[loss=0.06801, simple_loss=0.09219, pruned_loss=0.01297, audio_tagging_loss=0.008946, over 3052026.19 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 8.0 2023-11-26 01:03:24,435 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3146853.3333333335, ans=0.125 2023-11-26 01:03:24,449 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3146853.3333333335, ans=0.1 2023-11-26 01:03:41,115 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 472050 2023-11-26 01:03:46,596 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3146986.6666666665, ans=0.125 2023-11-26 01:03:53,622 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.32 vs. limit=15.0 2023-11-26 01:03:57,351 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3147053.3333333335, ans=0.2 2023-11-26 01:04:14,257 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 3150, loss[loss=0.06206, simple_loss=0.08162, pruned_loss=0.01217, audio_tagging_loss=0.009078, over 15642.00 frames. ], tot_loss[loss=0.06786, simple_loss=0.09201, pruned_loss=0.01285, audio_tagging_loss=0.009003, over 3049824.39 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 8.0 2023-11-26 01:04:14,417 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3147186.6666666665, ans=0.125 2023-11-26 01:04:17,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3147186.6666666665, ans=0.0 2023-11-26 01:04:18,760 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3147186.6666666665, ans=0.125 2023-11-26 01:04:36,242 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 472100 2023-11-26 01:04:47,286 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.998e+01 8.868e+01 9.358e+01 9.908e+01 1.230e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-26 01:04:49,642 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3147386.6666666665, ans=0.1 2023-11-26 01:05:07,057 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3147453.3333333335, ans=0.0 2023-11-26 01:05:09,982 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 3200, loss[loss=0.06642, simple_loss=0.08618, pruned_loss=0.01241, audio_tagging_loss=0.01092, over 14326.00 frames. ], tot_loss[loss=0.06754, simple_loss=0.09113, pruned_loss=0.01279, audio_tagging_loss=0.009183, over 3052953.27 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 01:05:17,393 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3147520.0, ans=0.125 2023-11-26 01:05:32,074 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 472150 2023-11-26 01:05:40,070 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3147653.3333333335, ans=0.1 2023-11-26 01:05:48,274 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3147720.0, ans=0.0 2023-11-26 01:06:04,159 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3147853.3333333335, ans=0.1 2023-11-26 01:06:04,940 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 3250, loss[loss=0.065, simple_loss=0.09254, pruned_loss=0.01075, audio_tagging_loss=0.007985, over 15649.00 frames. ], tot_loss[loss=0.06737, simple_loss=0.09061, pruned_loss=0.01282, audio_tagging_loss=0.009244, over 3051991.08 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 01:06:06,721 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.01 vs. limit=15.0 2023-11-26 01:06:10,101 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3147853.3333333335, ans=0.1 2023-11-26 01:06:21,257 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3147920.0, ans=0.0 2023-11-26 01:06:21,462 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.29 vs. limit=15.0 2023-11-26 01:06:26,368 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3147986.6666666665, ans=0.0 2023-11-26 01:06:27,278 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 472200 2023-11-26 01:06:28,820 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3147986.6666666665, ans=0.125 2023-11-26 01:06:38,676 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.433e+01 8.733e+01 9.362e+01 1.015e+02 1.651e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-26 01:06:59,252 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff3.min_abs, batch_count=3148120.0, ans=0.2 2023-11-26 01:07:00,541 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.26 vs. limit=15.0 2023-11-26 01:07:01,108 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 3300, loss[loss=0.08304, simple_loss=0.1072, pruned_loss=0.01776, audio_tagging_loss=0.01168, over 16809.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.08933, pruned_loss=0.01259, audio_tagging_loss=0.009251, over 3053912.67 frames. ], batch size: 63, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 01:07:08,080 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3148186.6666666665, ans=0.2 2023-11-26 01:07:19,787 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.35 vs. limit=15.0 2023-11-26 01:07:23,488 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 472250 2023-11-26 01:07:24,781 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3148320.0, ans=0.125 2023-11-26 01:07:32,199 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3148320.0, ans=0.1 2023-11-26 01:07:37,673 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3148386.6666666665, ans=0.04949747468305833 2023-11-26 01:07:57,002 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 3350, loss[loss=0.05432, simple_loss=0.07434, pruned_loss=0.009319, audio_tagging_loss=0.00783, over 15418.00 frames. ], tot_loss[loss=0.06681, simple_loss=0.08967, pruned_loss=0.01277, audio_tagging_loss=0.00921, over 3054412.12 frames. ], batch size: 61, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 01:08:05,057 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3148520.0, ans=0.0 2023-11-26 01:08:19,836 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 472300 2023-11-26 01:08:24,241 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3148653.3333333335, ans=0.1 2023-11-26 01:08:30,156 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3148720.0, ans=0.125 2023-11-26 01:08:30,885 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.038e+01 8.683e+01 9.249e+01 1.019e+02 1.203e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-26 01:08:33,194 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3148720.0, ans=0.2 2023-11-26 01:08:52,803 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 3400, loss[loss=0.06539, simple_loss=0.09542, pruned_loss=0.01226, audio_tagging_loss=0.005424, over 16508.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.08958, pruned_loss=0.01274, audio_tagging_loss=0.009051, over 3049587.76 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 01:08:58,987 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3148853.3333333335, ans=0.125 2023-11-26 01:09:10,548 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3148920.0, ans=0.2 2023-11-26 01:09:15,660 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 472350 2023-11-26 01:09:38,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3149120.0, ans=0.125 2023-11-26 01:09:43,801 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 01:09:48,859 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 3450, loss[loss=0.0507, simple_loss=0.06433, pruned_loss=0.007695, audio_tagging_loss=0.01084, over 15262.00 frames. ], tot_loss[loss=0.06679, simple_loss=0.09, pruned_loss=0.01281, audio_tagging_loss=0.008975, over 3047792.17 frames. ], batch size: 60, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 01:09:58,513 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.80 vs. limit=6.0 2023-11-26 01:10:11,438 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 472400 2023-11-26 01:10:12,885 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3149320.0, ans=0.125 2023-11-26 01:10:18,644 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.82 vs. limit=15.0 2023-11-26 01:10:22,056 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.081e+01 8.810e+01 9.451e+01 1.004e+02 1.366e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-26 01:10:31,900 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3149386.6666666665, ans=0.2 2023-11-26 01:10:39,954 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3149453.3333333335, ans=0.125 2023-11-26 01:10:41,090 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3149453.3333333335, ans=0.125 2023-11-26 01:10:45,052 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 3500, loss[loss=0.05023, simple_loss=0.06993, pruned_loss=0.006235, audio_tagging_loss=0.009025, over 14806.00 frames. ], tot_loss[loss=0.06676, simple_loss=0.0904, pruned_loss=0.01278, audio_tagging_loss=0.008778, over 3050463.41 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 01:10:50,994 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.84 vs. limit=10.0 2023-11-26 01:11:03,774 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3149586.6666666665, ans=0.0 2023-11-26 01:11:04,912 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3149586.6666666665, ans=0.125 2023-11-26 01:11:08,038 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 472450 2023-11-26 01:11:09,971 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.55 vs. limit=5.0 2023-11-26 01:11:15,504 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 01:11:32,170 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3149786.6666666665, ans=0.1 2023-11-26 01:11:40,873 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 3550, loss[loss=0.05783, simple_loss=0.07733, pruned_loss=0.009451, audio_tagging_loss=0.009718, over 13678.00 frames. ], tot_loss[loss=0.06653, simple_loss=0.09028, pruned_loss=0.01262, audio_tagging_loss=0.008763, over 3049710.06 frames. ], batch size: 53, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:11:59,425 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3149920.0, ans=0.2 2023-11-26 01:12:04,002 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 472500 2023-11-26 01:12:05,176 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3149986.6666666665, ans=0.125 2023-11-26 01:12:14,567 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.986e+01 8.583e+01 9.059e+01 9.596e+01 1.364e+02, threshold=1.812e+02, percent-clipped=0.0 2023-11-26 01:12:22,011 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3150053.3333333335, ans=0.0 2023-11-26 01:12:27,324 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3150120.0, ans=0.0 2023-11-26 01:12:32,226 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.91 vs. limit=6.0 2023-11-26 01:12:37,551 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 3600, loss[loss=0.05698, simple_loss=0.07095, pruned_loss=0.01298, audio_tagging_loss=0.008522, over 15388.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.08969, pruned_loss=0.0127, audio_tagging_loss=0.0088, over 3049992.65 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 01:12:46,149 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.48 vs. limit=6.0 2023-11-26 01:12:50,048 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3150253.3333333335, ans=0.1 2023-11-26 01:12:53,218 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3150253.3333333335, ans=0.125 2023-11-26 01:12:59,413 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 472550 2023-11-26 01:12:59,473 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3150320.0, ans=0.0 2023-11-26 01:12:59,570 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3150320.0, ans=0.1 2023-11-26 01:13:00,625 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3150320.0, ans=0.2 2023-11-26 01:13:17,194 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3150386.6666666665, ans=0.0 2023-11-26 01:13:23,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3150453.3333333335, ans=0.125 2023-11-26 01:13:24,928 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.05 vs. limit=15.0 2023-11-26 01:13:33,442 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 3650, loss[loss=0.05284, simple_loss=0.07357, pruned_loss=0.008505, audio_tagging_loss=0.007555, over 14578.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.09018, pruned_loss=0.0128, audio_tagging_loss=0.008686, over 3049671.97 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:13:36,145 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.96 vs. limit=15.0 2023-11-26 01:13:40,072 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3150520.0, ans=0.125 2023-11-26 01:13:49,172 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3150586.6666666665, ans=0.0 2023-11-26 01:13:55,252 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 472600 2023-11-26 01:13:55,444 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3150653.3333333335, ans=0.125 2023-11-26 01:14:08,553 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.534e+01 8.763e+01 9.129e+01 9.774e+01 1.635e+02, threshold=1.826e+02, percent-clipped=0.0 2023-11-26 01:14:24,792 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3150786.6666666665, ans=0.125 2023-11-26 01:14:26,826 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 01:14:28,163 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.83 vs. limit=15.0 2023-11-26 01:14:28,702 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 3700, loss[loss=0.07676, simple_loss=0.1033, pruned_loss=0.01561, audio_tagging_loss=0.009518, over 15866.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.08982, pruned_loss=0.01267, audio_tagging_loss=0.008652, over 3048806.98 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:14:29,990 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3150853.3333333335, ans=0.0 2023-11-26 01:14:45,352 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3150920.0, ans=0.2 2023-11-26 01:14:52,234 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 472650 2023-11-26 01:15:10,493 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3151053.3333333335, ans=0.125 2023-11-26 01:15:11,536 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3151053.3333333335, ans=0.125 2023-11-26 01:15:11,614 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3151053.3333333335, ans=0.125 2023-11-26 01:15:20,684 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3151120.0, ans=0.125 2023-11-26 01:15:25,892 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 3750, loss[loss=0.07925, simple_loss=0.1034, pruned_loss=0.01963, audio_tagging_loss=0.007895, over 15739.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.09042, pruned_loss=0.01282, audio_tagging_loss=0.008691, over 3050629.41 frames. ], batch size: 60, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:15:30,258 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3151186.6666666665, ans=0.015 2023-11-26 01:15:41,970 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.21 vs. limit=6.0 2023-11-26 01:15:42,548 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3151253.3333333335, ans=0.0 2023-11-26 01:15:47,756 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 472700 2023-11-26 01:15:54,367 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3151320.0, ans=0.125 2023-11-26 01:15:59,314 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.274e+01 8.900e+01 9.433e+01 1.035e+02 1.729e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-26 01:16:06,228 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 01:16:21,662 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 3800, loss[loss=0.06628, simple_loss=0.09489, pruned_loss=0.01084, audio_tagging_loss=0.007993, over 16848.00 frames. ], tot_loss[loss=0.06702, simple_loss=0.09078, pruned_loss=0.01295, audio_tagging_loss=0.008682, over 3053759.75 frames. ], batch size: 61, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:16:28,412 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3151520.0, ans=0.09899494936611666 2023-11-26 01:16:33,658 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3151586.6666666665, ans=0.0 2023-11-26 01:16:41,085 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3151586.6666666665, ans=0.125 2023-11-26 01:16:43,122 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 472750 2023-11-26 01:16:45,880 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3151653.3333333335, ans=0.125 2023-11-26 01:16:45,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3151653.3333333335, ans=0.125 2023-11-26 01:16:49,598 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3151653.3333333335, ans=0.1 2023-11-26 01:16:54,819 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3151720.0, ans=0.125 2023-11-26 01:17:16,314 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 3850, loss[loss=0.05824, simple_loss=0.07596, pruned_loss=0.009795, audio_tagging_loss=0.01046, over 14759.00 frames. ], tot_loss[loss=0.06701, simple_loss=0.09066, pruned_loss=0.0129, audio_tagging_loss=0.008783, over 3054132.19 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:17:25,401 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.71 vs. limit=22.5 2023-11-26 01:17:32,407 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.53 vs. limit=22.5 2023-11-26 01:17:39,146 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 472800 2023-11-26 01:17:39,727 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.52 vs. limit=22.5 2023-11-26 01:17:51,402 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.434e+01 8.590e+01 9.252e+01 9.700e+01 1.619e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-26 01:18:05,920 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.33 vs. limit=10.0 2023-11-26 01:18:12,592 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 3900, loss[loss=0.06723, simple_loss=0.09233, pruned_loss=0.01167, audio_tagging_loss=0.009396, over 15375.00 frames. ], tot_loss[loss=0.06694, simple_loss=0.09035, pruned_loss=0.01284, audio_tagging_loss=0.008927, over 3042220.52 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:18:15,444 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3152186.6666666665, ans=0.125 2023-11-26 01:18:22,330 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.99 vs. limit=15.0 2023-11-26 01:18:34,790 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 472850 2023-11-26 01:18:52,550 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3152386.6666666665, ans=0.125 2023-11-26 01:19:04,693 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3152453.3333333335, ans=0.0 2023-11-26 01:19:08,181 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 3950, loss[loss=0.07959, simple_loss=0.1134, pruned_loss=0.01424, audio_tagging_loss=0.008635, over 15218.00 frames. ], tot_loss[loss=0.0667, simple_loss=0.09007, pruned_loss=0.01271, audio_tagging_loss=0.008961, over 3037570.58 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:19:08,397 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3152520.0, ans=0.125 2023-11-26 01:19:18,040 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3152586.6666666665, ans=0.125 2023-11-26 01:19:19,004 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3152586.6666666665, ans=0.1 2023-11-26 01:19:29,416 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 472900 2023-11-26 01:19:40,708 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3152720.0, ans=0.125 2023-11-26 01:19:42,516 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.416e+01 8.671e+01 9.267e+01 9.996e+01 1.170e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-26 01:19:46,034 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 01:19:53,109 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.32 vs. limit=22.5 2023-11-26 01:20:03,254 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 4000, loss[loss=0.07218, simple_loss=0.09699, pruned_loss=0.01667, audio_tagging_loss=0.007007, over 14658.00 frames. ], tot_loss[loss=0.06678, simple_loss=0.08984, pruned_loss=0.01278, audio_tagging_loss=0.00908, over 3039781.11 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 01:20:22,046 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3152920.0, ans=0.1 2023-11-26 01:20:26,157 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 472950 2023-11-26 01:20:38,650 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3153053.3333333335, ans=0.0 2023-11-26 01:20:45,115 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3153053.3333333335, ans=0.1 2023-11-26 01:20:51,354 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3153120.0, ans=0.2 2023-11-26 01:20:54,888 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.57 vs. limit=15.0 2023-11-26 01:20:58,608 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 4050, loss[loss=0.08088, simple_loss=0.1026, pruned_loss=0.01986, audio_tagging_loss=0.009708, over 14975.00 frames. ], tot_loss[loss=0.06704, simple_loss=0.09042, pruned_loss=0.01275, audio_tagging_loss=0.00908, over 3047000.13 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 01:21:00,516 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3153186.6666666665, ans=0.125 2023-11-26 01:21:03,963 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 01:21:06,891 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3153186.6666666665, ans=0.0 2023-11-26 01:21:11,313 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.34 vs. limit=15.0 2023-11-26 01:21:22,158 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 473000 2023-11-26 01:21:22,345 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3153320.0, ans=0.125 2023-11-26 01:21:29,952 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 01:21:35,080 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.157e+01 8.884e+01 9.464e+01 1.024e+02 1.208e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-26 01:21:43,427 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3153453.3333333335, ans=0.0 2023-11-26 01:21:50,839 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 01:21:55,905 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 4100, loss[loss=0.05134, simple_loss=0.06785, pruned_loss=0.007635, audio_tagging_loss=0.00978, over 15882.00 frames. ], tot_loss[loss=0.06737, simple_loss=0.09093, pruned_loss=0.01277, audio_tagging_loss=0.009135, over 3046290.02 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:21:59,036 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.57 vs. limit=15.0 2023-11-26 01:22:12,685 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3153586.6666666665, ans=0.1 2023-11-26 01:22:17,727 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 473050 2023-11-26 01:22:20,081 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3153653.3333333335, ans=0.1 2023-11-26 01:22:25,802 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3153653.3333333335, ans=0.025 2023-11-26 01:22:41,535 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.98 vs. limit=22.5 2023-11-26 01:22:42,278 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3153786.6666666665, ans=0.125 2023-11-26 01:22:44,388 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3153786.6666666665, ans=0.0 2023-11-26 01:22:51,536 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 4150, loss[loss=0.07649, simple_loss=0.1073, pruned_loss=0.015, audio_tagging_loss=0.007821, over 15272.00 frames. ], tot_loss[loss=0.06718, simple_loss=0.09072, pruned_loss=0.01274, audio_tagging_loss=0.009084, over 3046950.09 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:22:53,312 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.03 vs. limit=10.0 2023-11-26 01:22:53,849 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3153853.3333333335, ans=0.125 2023-11-26 01:23:13,797 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 473100 2023-11-26 01:23:20,712 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3153986.6666666665, ans=0.0 2023-11-26 01:23:27,357 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.811e+01 8.761e+01 9.353e+01 9.782e+01 1.109e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-26 01:23:31,948 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3154053.3333333335, ans=0.125 2023-11-26 01:23:32,739 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 01:23:33,993 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3154053.3333333335, ans=0.0 2023-11-26 01:23:40,364 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3154120.0, ans=0.125 2023-11-26 01:23:41,510 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3154120.0, ans=0.0 2023-11-26 01:23:46,537 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 4200, loss[loss=0.08859, simple_loss=0.1234, pruned_loss=0.0195, audio_tagging_loss=0.007371, over 14989.00 frames. ], tot_loss[loss=0.06718, simple_loss=0.09099, pruned_loss=0.01275, audio_tagging_loss=0.008939, over 3048639.87 frames. ], batch size: 54, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:23:48,811 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3154186.6666666665, ans=0.0 2023-11-26 01:23:56,295 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3154186.6666666665, ans=0.125 2023-11-26 01:24:01,171 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3154253.3333333335, ans=0.09899494936611666 2023-11-26 01:24:06,910 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3154253.3333333335, ans=0.125 2023-11-26 01:24:10,143 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 473150 2023-11-26 01:24:22,977 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3154386.6666666665, ans=0.2 2023-11-26 01:24:29,883 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.18 vs. limit=15.0 2023-11-26 01:24:42,953 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 4250, loss[loss=0.07072, simple_loss=0.09528, pruned_loss=0.01432, audio_tagging_loss=0.008767, over 15562.00 frames. ], tot_loss[loss=0.06726, simple_loss=0.09136, pruned_loss=0.01272, audio_tagging_loss=0.008858, over 3043394.78 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:25:00,179 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3154586.6666666665, ans=0.0 2023-11-26 01:25:01,278 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3154586.6666666665, ans=0.0 2023-11-26 01:25:05,332 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 473200 2023-11-26 01:25:19,420 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.652e+01 8.719e+01 9.230e+01 1.020e+02 1.385e+02, threshold=1.846e+02, percent-clipped=0.0 2023-11-26 01:25:39,086 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 4300, loss[loss=0.06638, simple_loss=0.08798, pruned_loss=0.01287, audio_tagging_loss=0.009514, over 15835.00 frames. ], tot_loss[loss=0.06684, simple_loss=0.09077, pruned_loss=0.0126, audio_tagging_loss=0.008853, over 3046018.59 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:26:01,430 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 473250 2023-11-26 01:26:01,608 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3154986.6666666665, ans=0.0 2023-11-26 01:26:03,756 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3154986.6666666665, ans=0.125 2023-11-26 01:26:11,670 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3155053.3333333335, ans=0.07 2023-11-26 01:26:18,366 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3155053.3333333335, ans=0.1 2023-11-26 01:26:18,419 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3155053.3333333335, ans=0.125 2023-11-26 01:26:21,933 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.13 vs. limit=15.0 2023-11-26 01:26:32,185 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3155120.0, ans=0.1 2023-11-26 01:26:34,030 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 4350, loss[loss=0.05712, simple_loss=0.06563, pruned_loss=0.0116, audio_tagging_loss=0.0127, over 14940.00 frames. ], tot_loss[loss=0.06757, simple_loss=0.09187, pruned_loss=0.01284, audio_tagging_loss=0.008794, over 3046289.60 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:26:38,410 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3155186.6666666665, ans=0.125 2023-11-26 01:26:56,911 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 473300 2023-11-26 01:26:57,680 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3155320.0, ans=0.0 2023-11-26 01:27:05,999 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3155320.0, ans=0.0 2023-11-26 01:27:09,930 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.910e+01 8.594e+01 9.290e+01 1.001e+02 1.319e+02, threshold=1.858e+02, percent-clipped=0.0 2023-11-26 01:27:29,693 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.21 vs. limit=12.0 2023-11-26 01:27:30,057 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 4400, loss[loss=0.04913, simple_loss=0.06408, pruned_loss=0.00762, audio_tagging_loss=0.009465, over 14923.00 frames. ], tot_loss[loss=0.06751, simple_loss=0.0915, pruned_loss=0.01299, audio_tagging_loss=0.008763, over 3041362.13 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 01:27:52,527 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 473350 2023-11-26 01:27:56,983 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3155653.3333333335, ans=0.1 2023-11-26 01:28:25,644 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3155853.3333333335, ans=0.125 2023-11-26 01:28:26,504 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 4450, loss[loss=0.05654, simple_loss=0.06937, pruned_loss=0.009969, audio_tagging_loss=0.01189, over 14407.00 frames. ], tot_loss[loss=0.06782, simple_loss=0.09199, pruned_loss=0.01304, audio_tagging_loss=0.00878, over 3043180.25 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 01:28:34,616 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.07 vs. limit=15.0 2023-11-26 01:28:42,798 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3155920.0, ans=0.1 2023-11-26 01:28:43,717 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3155920.0, ans=0.1 2023-11-26 01:28:48,836 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 473400 2023-11-26 01:29:02,392 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.554e+01 8.765e+01 9.296e+01 9.987e+01 1.152e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-26 01:29:12,645 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3156120.0, ans=0.125 2023-11-26 01:29:18,972 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3156120.0, ans=0.0 2023-11-26 01:29:22,005 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 4500, loss[loss=0.06026, simple_loss=0.07051, pruned_loss=0.01303, audio_tagging_loss=0.01197, over 14942.00 frames. ], tot_loss[loss=0.06752, simple_loss=0.09187, pruned_loss=0.01285, audio_tagging_loss=0.008733, over 3050558.81 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 01:29:26,841 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.55 vs. limit=15.0 2023-11-26 01:29:44,846 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 473450 2023-11-26 01:29:54,031 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3156320.0, ans=0.125 2023-11-26 01:29:59,248 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3156386.6666666665, ans=0.04949747468305833 2023-11-26 01:30:03,207 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3156386.6666666665, ans=0.125 2023-11-26 01:30:06,655 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3156453.3333333335, ans=0.125 2023-11-26 01:30:13,094 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.10 vs. limit=12.0 2023-11-26 01:30:17,486 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3156520.0, ans=0.125 2023-11-26 01:30:18,336 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 4550, loss[loss=0.06717, simple_loss=0.08697, pruned_loss=0.01173, audio_tagging_loss=0.01196, over 14607.00 frames. ], tot_loss[loss=0.06735, simple_loss=0.09172, pruned_loss=0.01281, audio_tagging_loss=0.00868, over 3043710.37 frames. ], batch size: 54, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 01:30:29,084 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.59 vs. limit=15.0 2023-11-26 01:30:33,892 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3156586.6666666665, ans=0.035 2023-11-26 01:30:40,745 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 473500 2023-11-26 01:30:48,257 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3156653.3333333335, ans=0.125 2023-11-26 01:30:56,710 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.568e+01 8.762e+01 9.232e+01 1.004e+02 1.439e+02, threshold=1.846e+02, percent-clipped=0.0 2023-11-26 01:31:02,007 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 01:31:04,409 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3156786.6666666665, ans=0.0 2023-11-26 01:31:10,194 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3156786.6666666665, ans=0.0 2023-11-26 01:31:14,162 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 4600, loss[loss=0.07299, simple_loss=0.09911, pruned_loss=0.01693, audio_tagging_loss=0.006498, over 15695.00 frames. ], tot_loss[loss=0.06699, simple_loss=0.09119, pruned_loss=0.01268, audio_tagging_loss=0.008719, over 3043228.12 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 01:31:21,595 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.94 vs. limit=10.0 2023-11-26 01:31:28,775 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3156920.0, ans=0.125 2023-11-26 01:31:29,781 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3156920.0, ans=0.0 2023-11-26 01:31:34,997 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3156986.6666666665, ans=10.0 2023-11-26 01:31:35,984 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 473550 2023-11-26 01:31:55,767 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3157053.3333333335, ans=0.125 2023-11-26 01:32:05,014 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.11 vs. limit=15.0 2023-11-26 01:32:09,595 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.77 vs. limit=12.0 2023-11-26 01:32:10,069 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 4650, loss[loss=0.06042, simple_loss=0.07952, pruned_loss=0.01139, audio_tagging_loss=0.00927, over 14615.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.09046, pruned_loss=0.01254, audio_tagging_loss=0.008876, over 3047343.01 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 01:32:12,485 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3157186.6666666665, ans=0.2 2023-11-26 01:32:17,625 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3157186.6666666665, ans=0.0 2023-11-26 01:32:21,814 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3157253.3333333335, ans=0.05 2023-11-26 01:32:32,755 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 473600 2023-11-26 01:32:48,455 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.585e+01 8.767e+01 9.190e+01 1.011e+02 1.594e+02, threshold=1.838e+02, percent-clipped=0.0 2023-11-26 01:32:55,065 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3157453.3333333335, ans=0.125 2023-11-26 01:33:00,171 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3157453.3333333335, ans=0.1 2023-11-26 01:33:02,916 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3157453.3333333335, ans=0.1 2023-11-26 01:33:06,160 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.75 vs. limit=15.0 2023-11-26 01:33:06,573 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 4700, loss[loss=0.07622, simple_loss=0.1151, pruned_loss=0.01249, audio_tagging_loss=0.006188, over 15226.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.09001, pruned_loss=0.01253, audio_tagging_loss=0.008981, over 3042364.27 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 01:33:11,094 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff2.min_abs, batch_count=3157520.0, ans=0.1 2023-11-26 01:33:22,251 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3157586.6666666665, ans=0.1 2023-11-26 01:33:28,372 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 473650 2023-11-26 01:33:32,231 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3157653.3333333335, ans=0.2 2023-11-26 01:33:59,764 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.48 vs. limit=12.0 2023-11-26 01:34:02,333 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 4750, loss[loss=0.07014, simple_loss=0.09826, pruned_loss=0.0116, audio_tagging_loss=0.009411, over 15765.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.09022, pruned_loss=0.01253, audio_tagging_loss=0.009046, over 3040898.02 frames. ], batch size: 60, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 01:34:16,791 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3157920.0, ans=0.125 2023-11-26 01:34:24,129 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 473700 2023-11-26 01:34:40,728 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.460e+01 8.549e+01 9.197e+01 1.001e+02 1.331e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-26 01:34:51,704 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3158120.0, ans=0.125 2023-11-26 01:34:57,739 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 4800, loss[loss=0.0747, simple_loss=0.09849, pruned_loss=0.01562, audio_tagging_loss=0.009835, over 15054.00 frames. ], tot_loss[loss=0.06705, simple_loss=0.09058, pruned_loss=0.01265, audio_tagging_loss=0.009112, over 3041844.82 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:35:16,551 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3158253.3333333335, ans=0.5 2023-11-26 01:35:20,581 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 473750 2023-11-26 01:35:26,678 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3158320.0, ans=0.0 2023-11-26 01:35:28,769 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3158320.0, ans=0.0 2023-11-26 01:35:30,777 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3158386.6666666665, ans=0.125 2023-11-26 01:35:31,440 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.26 vs. limit=22.5 2023-11-26 01:35:39,872 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.55 vs. limit=10.0 2023-11-26 01:35:47,476 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3158453.3333333335, ans=0.1 2023-11-26 01:35:54,534 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 4850, loss[loss=0.073, simple_loss=0.1013, pruned_loss=0.01323, audio_tagging_loss=0.009119, over 14325.00 frames. ], tot_loss[loss=0.06664, simple_loss=0.08968, pruned_loss=0.01246, audio_tagging_loss=0.00934, over 3042890.04 frames. ], batch size: 53, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:36:01,133 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3158520.0, ans=0.125 2023-11-26 01:36:05,286 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.97 vs. limit=15.0 2023-11-26 01:36:13,453 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3158586.6666666665, ans=0.0 2023-11-26 01:36:16,393 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 473800 2023-11-26 01:36:18,885 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3158653.3333333335, ans=0.0 2023-11-26 01:36:19,925 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3158653.3333333335, ans=0.0 2023-11-26 01:36:32,057 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.019e+01 8.661e+01 9.165e+01 9.886e+01 1.284e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-26 01:36:39,627 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3158786.6666666665, ans=0.125 2023-11-26 01:36:48,853 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3158786.6666666665, ans=0.125 2023-11-26 01:36:50,747 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 4900, loss[loss=0.07433, simple_loss=0.103, pruned_loss=0.0151, audio_tagging_loss=0.007746, over 15429.00 frames. ], tot_loss[loss=0.06722, simple_loss=0.09089, pruned_loss=0.01267, audio_tagging_loss=0.009107, over 3048790.65 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:37:03,026 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.85 vs. limit=15.0 2023-11-26 01:37:12,770 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 473850 2023-11-26 01:37:46,082 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 4950, loss[loss=0.05909, simple_loss=0.06633, pruned_loss=0.01202, audio_tagging_loss=0.01391, over 16762.00 frames. ], tot_loss[loss=0.06667, simple_loss=0.0903, pruned_loss=0.01249, audio_tagging_loss=0.009033, over 3041764.72 frames. ], batch size: 63, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:37:59,732 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff2.min_abs, batch_count=3159253.3333333335, ans=0.1 2023-11-26 01:38:09,191 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 473900 2023-11-26 01:38:17,305 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3159320.0, ans=0.125 2023-11-26 01:38:24,452 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.979e+01 8.576e+01 9.071e+01 1.006e+02 1.445e+02, threshold=1.814e+02, percent-clipped=0.0 2023-11-26 01:38:24,655 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3159386.6666666665, ans=0.0 2023-11-26 01:38:26,710 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3159386.6666666665, ans=0.125 2023-11-26 01:38:35,208 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3159453.3333333335, ans=0.125 2023-11-26 01:38:41,908 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 5000, loss[loss=0.05377, simple_loss=0.07692, pruned_loss=0.009611, audio_tagging_loss=0.005701, over 15042.00 frames. ], tot_loss[loss=0.06685, simple_loss=0.09078, pruned_loss=0.01263, audio_tagging_loss=0.008828, over 3045475.05 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:39:04,615 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 473950 2023-11-26 01:39:12,185 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3159653.3333333335, ans=0.2 2023-11-26 01:39:12,337 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.73 vs. limit=15.0 2023-11-26 01:39:19,517 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3159720.0, ans=0.0 2023-11-26 01:39:30,568 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3159786.6666666665, ans=0.125 2023-11-26 01:39:31,691 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3159786.6666666665, ans=0.125 2023-11-26 01:39:38,333 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 5050, loss[loss=0.05483, simple_loss=0.08006, pruned_loss=0.007462, audio_tagging_loss=0.007336, over 15148.00 frames. ], tot_loss[loss=0.06703, simple_loss=0.09104, pruned_loss=0.01273, audio_tagging_loss=0.008778, over 3047476.48 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:39:42,102 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.38 vs. limit=22.5 2023-11-26 01:39:48,780 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.52 vs. limit=5.0 2023-11-26 01:39:49,156 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3159920.0, ans=0.125 2023-11-26 01:39:52,396 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3159920.0, ans=0.125 2023-11-26 01:39:59,832 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 474000 2023-11-26 01:40:11,425 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 01:40:11,501 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3160053.3333333335, ans=0.125 2023-11-26 01:40:12,514 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 01:40:13,620 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3160053.3333333335, ans=0.125 2023-11-26 01:40:16,447 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.715e+01 8.830e+01 9.284e+01 9.877e+01 1.399e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-26 01:40:21,419 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3160053.3333333335, ans=0.125 2023-11-26 01:40:28,997 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3160120.0, ans=0.05 2023-11-26 01:40:29,465 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.42 vs. limit=15.0 2023-11-26 01:40:33,207 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3160186.6666666665, ans=0.125 2023-11-26 01:40:34,112 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 5100, loss[loss=0.06906, simple_loss=0.08921, pruned_loss=0.01346, audio_tagging_loss=0.011, over 14819.00 frames. ], tot_loss[loss=0.06713, simple_loss=0.0914, pruned_loss=0.01267, audio_tagging_loss=0.008755, over 3051333.91 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:40:34,393 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3160186.6666666665, ans=0.1 2023-11-26 01:40:47,942 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3160253.3333333335, ans=10.0 2023-11-26 01:40:56,375 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 474050 2023-11-26 01:41:28,750 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 5150, loss[loss=0.0765, simple_loss=0.1027, pruned_loss=0.0178, audio_tagging_loss=0.007324, over 15734.00 frames. ], tot_loss[loss=0.06725, simple_loss=0.09158, pruned_loss=0.01279, audio_tagging_loss=0.008669, over 3054869.09 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 01:41:51,708 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 474100 2023-11-26 01:41:58,270 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3160653.3333333335, ans=0.125 2023-11-26 01:42:03,745 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.85 vs. limit=15.0 2023-11-26 01:42:07,593 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.243e+01 8.876e+01 9.273e+01 9.906e+01 1.225e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-26 01:42:25,187 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 5200, loss[loss=0.06074, simple_loss=0.06709, pruned_loss=0.01499, audio_tagging_loss=0.0122, over 14815.00 frames. ], tot_loss[loss=0.06763, simple_loss=0.09175, pruned_loss=0.01305, audio_tagging_loss=0.008701, over 3054265.06 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:42:42,786 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3160920.0, ans=0.125 2023-11-26 01:42:46,950 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 474150 2023-11-26 01:42:48,440 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.58 vs. limit=15.0 2023-11-26 01:43:20,741 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 5250, loss[loss=0.07727, simple_loss=0.09947, pruned_loss=0.01649, audio_tagging_loss=0.01106, over 15894.00 frames. ], tot_loss[loss=0.06738, simple_loss=0.09167, pruned_loss=0.01288, audio_tagging_loss=0.008667, over 3050139.14 frames. ], batch size: 61, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:43:33,612 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3161253.3333333335, ans=0.1 2023-11-26 01:43:36,756 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3161253.3333333335, ans=0.125 2023-11-26 01:43:43,070 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 474200 2023-11-26 01:43:47,272 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3161320.0, ans=0.09899494936611666 2023-11-26 01:43:55,106 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3161386.6666666665, ans=0.0 2023-11-26 01:44:00,247 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.455e+01 8.738e+01 9.374e+01 1.008e+02 2.043e+02, threshold=1.875e+02, percent-clipped=1.0 2023-11-26 01:44:01,595 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3161386.6666666665, ans=0.2 2023-11-26 01:44:11,104 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3161453.3333333335, ans=0.0 2023-11-26 01:44:16,259 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 5300, loss[loss=0.05311, simple_loss=0.07574, pruned_loss=0.009048, audio_tagging_loss=0.006187, over 14199.00 frames. ], tot_loss[loss=0.06738, simple_loss=0.09175, pruned_loss=0.01287, audio_tagging_loss=0.008639, over 3048570.97 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:44:24,957 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3161520.0, ans=0.125 2023-11-26 01:44:29,706 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3161586.6666666665, ans=0.0 2023-11-26 01:44:39,574 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 474250 2023-11-26 01:44:41,950 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.13 vs. limit=15.0 2023-11-26 01:44:45,939 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_na.min_abs, batch_count=3161653.3333333335, ans=0.02 2023-11-26 01:44:56,778 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.58 vs. limit=22.5 2023-11-26 01:45:00,721 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3161786.6666666665, ans=0.2 2023-11-26 01:45:12,052 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 5350, loss[loss=0.0727, simple_loss=0.1058, pruned_loss=0.01179, audio_tagging_loss=0.008013, over 15720.00 frames. ], tot_loss[loss=0.06734, simple_loss=0.09146, pruned_loss=0.01289, audio_tagging_loss=0.008722, over 3053310.43 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:45:18,863 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.91 vs. limit=15.0 2023-11-26 01:45:20,669 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3161853.3333333335, ans=0.125 2023-11-26 01:45:31,166 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3161920.0, ans=0.125 2023-11-26 01:45:34,079 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 474300 2023-11-26 01:45:36,351 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3161986.6666666665, ans=0.0 2023-11-26 01:45:50,159 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3162053.3333333335, ans=0.125 2023-11-26 01:45:50,943 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.249e+01 8.668e+01 9.369e+01 1.006e+02 1.281e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-26 01:45:54,383 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3162053.3333333335, ans=0.1 2023-11-26 01:45:54,419 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3162053.3333333335, ans=0.1 2023-11-26 01:46:06,121 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3162120.0, ans=0.125 2023-11-26 01:46:06,597 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.55 vs. limit=22.5 2023-11-26 01:46:08,073 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 5400, loss[loss=0.07324, simple_loss=0.1081, pruned_loss=0.01159, audio_tagging_loss=0.007589, over 16153.00 frames. ], tot_loss[loss=0.06741, simple_loss=0.09147, pruned_loss=0.01291, audio_tagging_loss=0.008764, over 3046897.94 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:46:14,644 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3162186.6666666665, ans=0.1 2023-11-26 01:46:27,574 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.51 vs. limit=15.0 2023-11-26 01:46:29,654 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 474350 2023-11-26 01:46:33,450 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3162320.0, ans=0.125 2023-11-26 01:47:02,407 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 5450, loss[loss=0.04969, simple_loss=0.05932, pruned_loss=0.00927, audio_tagging_loss=0.01076, over 15639.00 frames. ], tot_loss[loss=0.06777, simple_loss=0.09199, pruned_loss=0.01302, audio_tagging_loss=0.008761, over 3048043.63 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:47:25,656 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 474400 2023-11-26 01:47:25,657 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3162653.3333333335, ans=0.125 2023-11-26 01:47:31,341 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3162653.3333333335, ans=0.125 2023-11-26 01:47:38,747 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3162720.0, ans=0.125 2023-11-26 01:47:40,771 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3162720.0, ans=0.125 2023-11-26 01:47:41,656 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.978e+01 8.837e+01 9.304e+01 1.039e+02 1.325e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-26 01:47:58,527 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 5500, loss[loss=0.04779, simple_loss=0.0639, pruned_loss=0.008401, audio_tagging_loss=0.007443, over 14740.00 frames. ], tot_loss[loss=0.06755, simple_loss=0.09155, pruned_loss=0.01304, audio_tagging_loss=0.008737, over 3049512.63 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:48:20,989 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 474450 2023-11-26 01:48:21,056 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3162986.6666666665, ans=0.0 2023-11-26 01:48:22,147 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3162986.6666666665, ans=0.0 2023-11-26 01:48:54,834 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 5550, loss[loss=0.07953, simple_loss=0.1114, pruned_loss=0.01435, audio_tagging_loss=0.009476, over 14467.00 frames. ], tot_loss[loss=0.06732, simple_loss=0.09079, pruned_loss=0.01295, audio_tagging_loss=0.008975, over 3047087.40 frames. ], batch size: 54, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 01:48:55,492 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.34 vs. limit=22.5 2023-11-26 01:48:56,114 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3163186.6666666665, ans=0.125 2023-11-26 01:49:14,130 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3163253.3333333335, ans=0.0 2023-11-26 01:49:14,183 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3163253.3333333335, ans=0.125 2023-11-26 01:49:16,210 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 474500 2023-11-26 01:49:35,111 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.448e+01 8.884e+01 9.348e+01 1.017e+02 1.220e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-26 01:49:49,986 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 5600, loss[loss=0.07236, simple_loss=0.1114, pruned_loss=0.01015, audio_tagging_loss=0.006483, over 16167.00 frames. ], tot_loss[loss=0.06715, simple_loss=0.09083, pruned_loss=0.01274, audio_tagging_loss=0.008987, over 3045559.48 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:50:04,998 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3163586.6666666665, ans=0.125 2023-11-26 01:50:07,113 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3163586.6666666665, ans=0.1 2023-11-26 01:50:08,346 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.98 vs. limit=6.0 2023-11-26 01:50:12,229 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 474550 2023-11-26 01:50:18,775 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3163653.3333333335, ans=0.09899494936611666 2023-11-26 01:50:30,505 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3163720.0, ans=0.0 2023-11-26 01:50:31,298 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 01:50:37,808 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3163786.6666666665, ans=0.125 2023-11-26 01:50:46,150 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 5650, loss[loss=0.06489, simple_loss=0.09093, pruned_loss=0.01213, audio_tagging_loss=0.00729, over 15973.00 frames. ], tot_loss[loss=0.06754, simple_loss=0.09136, pruned_loss=0.01284, audio_tagging_loss=0.009018, over 3055537.94 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:50:59,909 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3163920.0, ans=0.125 2023-11-26 01:51:09,083 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 474600 2023-11-26 01:51:14,729 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3163986.6666666665, ans=0.1 2023-11-26 01:51:26,859 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.372e+01 8.458e+01 9.048e+01 9.939e+01 1.364e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-26 01:51:42,954 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 5700, loss[loss=0.07862, simple_loss=0.1153, pruned_loss=0.01774, audio_tagging_loss=0.003213, over 14462.00 frames. ], tot_loss[loss=0.06755, simple_loss=0.09133, pruned_loss=0.01285, audio_tagging_loss=0.009035, over 3047431.77 frames. ], batch size: 53, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:51:50,971 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3164186.6666666665, ans=0.035 2023-11-26 01:52:04,731 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 474650 2023-11-26 01:52:05,985 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3164320.0, ans=0.0 2023-11-26 01:52:07,129 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3164320.0, ans=0.125 2023-11-26 01:52:20,183 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.63 vs. limit=15.0 2023-11-26 01:52:22,500 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3164386.6666666665, ans=0.125 2023-11-26 01:52:27,658 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3164453.3333333335, ans=0.0 2023-11-26 01:52:38,752 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 5750, loss[loss=0.06596, simple_loss=0.08622, pruned_loss=0.01182, audio_tagging_loss=0.01103, over 14711.00 frames. ], tot_loss[loss=0.06708, simple_loss=0.09059, pruned_loss=0.0127, audio_tagging_loss=0.009086, over 3037443.25 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 01:52:58,230 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3164586.6666666665, ans=0.125 2023-11-26 01:53:01,313 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 474700 2023-11-26 01:53:08,339 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3164653.3333333335, ans=0.0 2023-11-26 01:53:15,091 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3164720.0, ans=0.125 2023-11-26 01:53:15,153 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3164720.0, ans=0.0 2023-11-26 01:53:20,234 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.530e+01 8.765e+01 9.381e+01 1.013e+02 1.424e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-26 01:53:34,175 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.07 vs. limit=15.0 2023-11-26 01:53:34,544 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 5800, loss[loss=0.07128, simple_loss=0.1042, pruned_loss=0.01254, audio_tagging_loss=0.006644, over 15658.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.09041, pruned_loss=0.01253, audio_tagging_loss=0.008915, over 3037711.17 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 01:53:38,395 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3164853.3333333335, ans=0.125 2023-11-26 01:53:44,774 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3164920.0, ans=0.0 2023-11-26 01:53:57,405 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 474750 2023-11-26 01:54:10,288 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 01:54:25,149 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3165120.0, ans=0.1 2023-11-26 01:54:30,606 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 5850, loss[loss=0.0694, simple_loss=0.09881, pruned_loss=0.01185, audio_tagging_loss=0.008148, over 16478.00 frames. ], tot_loss[loss=0.06667, simple_loss=0.09048, pruned_loss=0.01253, audio_tagging_loss=0.008902, over 3037147.89 frames. ], batch size: 62, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 01:54:39,665 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.90 vs. limit=15.0 2023-11-26 01:54:47,211 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3165253.3333333335, ans=0.125 2023-11-26 01:54:52,374 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 474800 2023-11-26 01:55:04,363 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3165386.6666666665, ans=0.0 2023-11-26 01:55:10,817 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.08 vs. limit=15.0 2023-11-26 01:55:11,937 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.931e+01 8.672e+01 9.332e+01 1.005e+02 2.095e+02, threshold=1.866e+02, percent-clipped=1.0 2023-11-26 01:55:17,716 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.08 vs. limit=15.0 2023-11-26 01:55:21,658 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3165453.3333333335, ans=0.125 2023-11-26 01:55:26,151 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 5900, loss[loss=0.06182, simple_loss=0.08675, pruned_loss=0.01119, audio_tagging_loss=0.007251, over 15259.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.09033, pruned_loss=0.01234, audio_tagging_loss=0.00883, over 3041826.78 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 01:55:30,628 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3165520.0, ans=0.2 2023-11-26 01:55:44,331 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3165586.6666666665, ans=0.125 2023-11-26 01:55:48,408 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 474850 2023-11-26 01:56:02,856 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3165720.0, ans=0.1 2023-11-26 01:56:08,656 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3165720.0, ans=0.125 2023-11-26 01:56:18,148 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3165786.6666666665, ans=0.1 2023-11-26 01:56:21,178 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 5950, loss[loss=0.07132, simple_loss=0.1013, pruned_loss=0.0128, audio_tagging_loss=0.007872, over 14919.00 frames. ], tot_loss[loss=0.06676, simple_loss=0.09117, pruned_loss=0.01244, audio_tagging_loss=0.00874, over 3056752.54 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 01:56:44,131 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 474900 2023-11-26 01:57:02,555 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.309e+01 8.504e+01 9.073e+01 9.680e+01 1.067e+02, threshold=1.815e+02, percent-clipped=0.0 2023-11-26 01:57:16,946 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.56 vs. limit=22.5 2023-11-26 01:57:17,305 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 6000, loss[loss=0.03786, simple_loss=0.04899, pruned_loss=0.005978, audio_tagging_loss=0.007384, over 14076.00 frames. ], tot_loss[loss=0.06701, simple_loss=0.09145, pruned_loss=0.01257, audio_tagging_loss=0.008722, over 3045958.74 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:57:17,307 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-26 01:57:33,038 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.2013, 4.9222, 4.5164, 4.7709], device='cuda:0') 2023-11-26 01:57:49,491 INFO [train_asr.py:1267] (0/4) Epoch 40, validation: loss=0.0577, simple_loss=0.05067, pruned_loss=0.005162, audio_tagging_loss=0.0272, over 4681554.00 frames. 2023-11-26 01:57:49,492 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-26 01:57:53,480 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3166186.6666666665, ans=0.125 2023-11-26 01:57:56,711 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3166186.6666666665, ans=0.0 2023-11-26 01:58:07,192 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3166253.3333333335, ans=0.125 2023-11-26 01:58:12,980 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 474950 2023-11-26 01:58:14,087 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_na.min_abs, batch_count=3166320.0, ans=0.02 2023-11-26 01:58:30,857 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 01:58:34,140 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3166453.3333333335, ans=0.125 2023-11-26 01:58:34,167 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3166453.3333333335, ans=0.125 2023-11-26 01:58:45,671 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 6050, loss[loss=0.07245, simple_loss=0.09592, pruned_loss=0.01441, audio_tagging_loss=0.01008, over 14598.00 frames. ], tot_loss[loss=0.06719, simple_loss=0.09165, pruned_loss=0.01263, audio_tagging_loss=0.008736, over 3047391.68 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:58:56,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3166586.6666666665, ans=0.1 2023-11-26 01:59:08,382 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 475000 2023-11-26 01:59:27,535 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.174e+01 8.819e+01 9.341e+01 9.960e+01 1.201e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-26 01:59:28,899 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3166720.0, ans=0.0 2023-11-26 01:59:30,949 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3166786.6666666665, ans=0.2 2023-11-26 01:59:31,553 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.20 vs. limit=15.0 2023-11-26 01:59:33,595 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3166786.6666666665, ans=0.1 2023-11-26 01:59:42,377 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 6100, loss[loss=0.07865, simple_loss=0.1048, pruned_loss=0.01955, audio_tagging_loss=0.006714, over 15105.00 frames. ], tot_loss[loss=0.06692, simple_loss=0.09126, pruned_loss=0.01259, audio_tagging_loss=0.008697, over 3051556.41 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:59:43,016 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.77 vs. limit=15.0 2023-11-26 01:59:52,193 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3166920.0, ans=0.0 2023-11-26 01:59:54,786 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.48 vs. limit=10.0 2023-11-26 01:59:55,272 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3166920.0, ans=0.125 2023-11-26 02:00:04,230 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 475050 2023-11-26 02:00:05,478 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3166986.6666666665, ans=0.125 2023-11-26 02:00:05,916 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.49 vs. limit=15.0 2023-11-26 02:00:14,953 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3167053.3333333335, ans=0.09899494936611666 2023-11-26 02:00:37,605 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 6150, loss[loss=0.06104, simple_loss=0.08166, pruned_loss=0.01199, audio_tagging_loss=0.008225, over 14597.00 frames. ], tot_loss[loss=0.06684, simple_loss=0.09086, pruned_loss=0.01264, audio_tagging_loss=0.008774, over 3051533.50 frames. ], batch size: 54, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:00:37,873 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3167186.6666666665, ans=0.125 2023-11-26 02:00:54,663 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3167253.3333333335, ans=0.125 2023-11-26 02:00:57,619 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.71 vs. limit=15.0 2023-11-26 02:00:58,358 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3167253.3333333335, ans=0.0 2023-11-26 02:01:00,322 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 475100 2023-11-26 02:01:18,778 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.517e+01 8.792e+01 9.265e+01 9.786e+01 1.351e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-26 02:01:33,675 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 6200, loss[loss=0.0672, simple_loss=0.07806, pruned_loss=0.01616, audio_tagging_loss=0.01201, over 14838.00 frames. ], tot_loss[loss=0.06694, simple_loss=0.09082, pruned_loss=0.01268, audio_tagging_loss=0.008845, over 3045327.51 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:01:56,161 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 475150 2023-11-26 02:02:15,605 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3167720.0, ans=0.2 2023-11-26 02:02:30,247 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 6250, loss[loss=0.06819, simple_loss=0.09086, pruned_loss=0.01466, audio_tagging_loss=0.008108, over 14270.00 frames. ], tot_loss[loss=0.06683, simple_loss=0.09037, pruned_loss=0.01268, audio_tagging_loss=0.008961, over 3045289.69 frames. ], batch size: 54, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:02:30,871 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.28 vs. limit=15.0 2023-11-26 02:02:51,577 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 475200 2023-11-26 02:03:11,824 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.104e+01 8.743e+01 9.437e+01 1.009e+02 1.277e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-26 02:03:21,874 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.27 vs. limit=22.5 2023-11-26 02:03:25,459 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 6300, loss[loss=0.03153, simple_loss=0.03322, pruned_loss=0.001361, audio_tagging_loss=0.01356, over 15103.00 frames. ], tot_loss[loss=0.068, simple_loss=0.09197, pruned_loss=0.01301, audio_tagging_loss=0.009002, over 3054288.13 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:03:33,106 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3168186.6666666665, ans=0.015 2023-11-26 02:03:43,931 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3168253.3333333335, ans=0.2 2023-11-26 02:03:48,497 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 475250 2023-11-26 02:03:49,721 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3168320.0, ans=0.0 2023-11-26 02:04:11,878 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.65 vs. limit=12.0 2023-11-26 02:04:16,039 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.03 vs. limit=15.0 2023-11-26 02:04:17,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3168453.3333333335, ans=0.0 2023-11-26 02:04:20,892 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 6350, loss[loss=0.06001, simple_loss=0.07945, pruned_loss=0.01029, audio_tagging_loss=0.009996, over 14978.00 frames. ], tot_loss[loss=0.06788, simple_loss=0.0917, pruned_loss=0.01298, audio_tagging_loss=0.009049, over 3051574.73 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:04:34,687 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.70 vs. limit=15.0 2023-11-26 02:04:44,269 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 475300 2023-11-26 02:04:45,491 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3168653.3333333335, ans=0.2 2023-11-26 02:05:02,216 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.087e+01 8.674e+01 9.185e+01 1.006e+02 1.507e+02, threshold=1.837e+02, percent-clipped=0.0 2023-11-26 02:05:17,594 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 6400, loss[loss=0.07193, simple_loss=0.1009, pruned_loss=0.01292, audio_tagging_loss=0.008559, over 15453.00 frames. ], tot_loss[loss=0.06774, simple_loss=0.0911, pruned_loss=0.01299, audio_tagging_loss=0.009198, over 3054214.34 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:05:33,849 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3168920.0, ans=0.0 2023-11-26 02:05:38,093 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3168986.6666666665, ans=0.07 2023-11-26 02:05:38,910 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 475350 2023-11-26 02:06:12,563 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 6450, loss[loss=0.05561, simple_loss=0.07382, pruned_loss=0.009653, audio_tagging_loss=0.009052, over 14630.00 frames. ], tot_loss[loss=0.06758, simple_loss=0.09087, pruned_loss=0.01295, audio_tagging_loss=0.009197, over 3043853.72 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:06:13,805 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 02:06:34,470 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 475400 2023-11-26 02:06:39,177 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3169320.0, ans=0.1 2023-11-26 02:06:45,022 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3169320.0, ans=0.2 2023-11-26 02:06:55,473 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.943e+01 8.666e+01 9.241e+01 9.984e+01 1.381e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-26 02:07:05,263 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3169453.3333333335, ans=0.125 2023-11-26 02:07:08,093 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 6500, loss[loss=0.06624, simple_loss=0.09169, pruned_loss=0.01461, audio_tagging_loss=0.005778, over 15869.00 frames. ], tot_loss[loss=0.06794, simple_loss=0.09141, pruned_loss=0.01311, audio_tagging_loss=0.009126, over 3041384.10 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:07:16,664 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.58 vs. limit=15.0 2023-11-26 02:07:17,395 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3169520.0, ans=0.1 2023-11-26 02:07:28,711 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3169586.6666666665, ans=0.125 2023-11-26 02:07:31,668 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 475450 2023-11-26 02:07:39,314 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3169653.3333333335, ans=0.125 2023-11-26 02:07:41,452 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3169720.0, ans=0.0 2023-11-26 02:08:04,752 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 6550, loss[loss=0.06649, simple_loss=0.09785, pruned_loss=0.01013, audio_tagging_loss=0.007429, over 14715.00 frames. ], tot_loss[loss=0.06725, simple_loss=0.09066, pruned_loss=0.01293, audio_tagging_loss=0.008993, over 3047600.55 frames. ], batch size: 54, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:08:06,998 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.07 vs. limit=12.0 2023-11-26 02:08:09,895 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.67 vs. limit=12.0 2023-11-26 02:08:27,232 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 475500 2023-11-26 02:08:44,701 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.88 vs. limit=15.0 2023-11-26 02:08:47,307 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.189e+01 8.547e+01 9.134e+01 1.014e+02 1.239e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-26 02:08:57,823 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3170120.0, ans=0.05 2023-11-26 02:09:00,757 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 6600, loss[loss=0.05929, simple_loss=0.08273, pruned_loss=0.008896, audio_tagging_loss=0.009029, over 14584.00 frames. ], tot_loss[loss=0.0671, simple_loss=0.09086, pruned_loss=0.01274, audio_tagging_loss=0.008929, over 3050071.59 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:09:09,375 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3170186.6666666665, ans=0.125 2023-11-26 02:09:16,814 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3170253.3333333335, ans=0.125 2023-11-26 02:09:22,492 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 475550 2023-11-26 02:09:29,682 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3170320.0, ans=0.2 2023-11-26 02:09:55,782 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 6650, loss[loss=0.08393, simple_loss=0.1218, pruned_loss=0.01841, audio_tagging_loss=0.004627, over 16793.00 frames. ], tot_loss[loss=0.06675, simple_loss=0.09074, pruned_loss=0.01259, audio_tagging_loss=0.00879, over 3050708.04 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:10:18,513 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 02:10:18,725 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.89 vs. limit=15.0 2023-11-26 02:10:19,358 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 475600 2023-11-26 02:10:28,328 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3170653.3333333335, ans=0.0 2023-11-26 02:10:38,689 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.085e+01 8.810e+01 9.306e+01 1.020e+02 1.538e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-26 02:10:39,504 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.45 vs. limit=15.0 2023-11-26 02:10:43,979 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.90 vs. limit=15.0 2023-11-26 02:10:49,979 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3170786.6666666665, ans=0.125 2023-11-26 02:10:51,229 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3170853.3333333335, ans=0.125 2023-11-26 02:10:52,577 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 6700, loss[loss=0.06487, simple_loss=0.08687, pruned_loss=0.01242, audio_tagging_loss=0.009008, over 15195.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.09056, pruned_loss=0.01256, audio_tagging_loss=0.008737, over 3049279.77 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:10:55,966 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3170853.3333333335, ans=0.125 2023-11-26 02:11:14,722 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 475650 2023-11-26 02:11:19,100 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3170986.6666666665, ans=0.95 2023-11-26 02:11:31,277 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3171053.3333333335, ans=0.125 2023-11-26 02:11:35,037 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3171053.3333333335, ans=0.1 2023-11-26 02:11:42,427 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.59 vs. limit=15.0 2023-11-26 02:11:48,605 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 6750, loss[loss=0.06676, simple_loss=0.08957, pruned_loss=0.01227, audio_tagging_loss=0.009709, over 15665.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.09091, pruned_loss=0.0125, audio_tagging_loss=0.0087, over 3042435.90 frames. ], batch size: 60, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:12:10,180 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 475700 2023-11-26 02:12:30,953 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.458e+01 8.571e+01 9.016e+01 1.004e+02 1.567e+02, threshold=1.803e+02, percent-clipped=0.0 2023-11-26 02:12:35,515 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3171453.3333333335, ans=0.2 2023-11-26 02:12:42,916 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3171520.0, ans=0.5 2023-11-26 02:12:43,783 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 6800, loss[loss=0.05329, simple_loss=0.0676, pruned_loss=0.007712, audio_tagging_loss=0.01177, over 14846.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.08958, pruned_loss=0.01223, audio_tagging_loss=0.00882, over 3040214.17 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:12:47,238 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3171520.0, ans=0.1 2023-11-26 02:12:50,964 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3171520.0, ans=0.1 2023-11-26 02:13:06,604 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 475750 2023-11-26 02:13:23,210 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3171720.0, ans=0.125 2023-11-26 02:13:39,501 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 6850, loss[loss=0.06018, simple_loss=0.07485, pruned_loss=0.01112, audio_tagging_loss=0.01164, over 15976.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08948, pruned_loss=0.01215, audio_tagging_loss=0.008793, over 3045736.39 frames. ], batch size: 60, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:13:43,787 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.13 vs. limit=15.0 2023-11-26 02:13:44,672 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.06 vs. limit=22.5 2023-11-26 02:13:50,328 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-26 02:13:55,560 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3171920.0, ans=0.125 2023-11-26 02:13:57,193 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3171920.0, ans=0.125 2023-11-26 02:14:02,216 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 475800 2023-11-26 02:14:03,867 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 02:14:14,247 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3172053.3333333335, ans=0.125 2023-11-26 02:14:20,080 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3172053.3333333335, ans=0.0 2023-11-26 02:14:22,091 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.977e+01 8.606e+01 9.440e+01 1.002e+02 1.257e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-26 02:14:22,270 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3172053.3333333335, ans=0.125 2023-11-26 02:14:22,724 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=3172053.3333333335, ans=22.5 2023-11-26 02:14:35,836 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 6900, loss[loss=0.08342, simple_loss=0.1069, pruned_loss=0.01986, audio_tagging_loss=0.0101, over 14067.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.09015, pruned_loss=0.01228, audio_tagging_loss=0.008779, over 3044650.13 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:14:36,020 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3172186.6666666665, ans=0.125 2023-11-26 02:14:40,913 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3172186.6666666665, ans=0.125 2023-11-26 02:14:47,047 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 02:14:47,507 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.60 vs. limit=6.0 2023-11-26 02:14:58,174 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 475850 2023-11-26 02:15:17,288 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3172386.6666666665, ans=0.2 2023-11-26 02:15:19,175 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 02:15:31,307 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 6950, loss[loss=0.06575, simple_loss=0.08982, pruned_loss=0.01077, audio_tagging_loss=0.01007, over 15257.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.0905, pruned_loss=0.01242, audio_tagging_loss=0.008835, over 3039247.54 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:15:35,284 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.62 vs. limit=15.0 2023-11-26 02:15:49,577 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3172586.6666666665, ans=0.1 2023-11-26 02:15:54,164 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 475900 2023-11-26 02:16:07,104 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3172720.0, ans=0.5 2023-11-26 02:16:14,713 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.019e+01 8.522e+01 9.109e+01 9.823e+01 1.262e+02, threshold=1.822e+02, percent-clipped=0.0 2023-11-26 02:16:22,388 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3172786.6666666665, ans=0.2 2023-11-26 02:16:26,108 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3172853.3333333335, ans=0.125 2023-11-26 02:16:26,960 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 7000, loss[loss=0.05215, simple_loss=0.06769, pruned_loss=0.01003, audio_tagging_loss=0.008273, over 14457.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.09047, pruned_loss=0.0124, audio_tagging_loss=0.008866, over 3038948.50 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:16:27,219 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3172853.3333333335, ans=0.1 2023-11-26 02:16:49,061 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 475950 2023-11-26 02:16:55,982 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3172986.6666666665, ans=0.125 2023-11-26 02:17:15,063 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.52 vs. limit=10.0 2023-11-26 02:17:22,599 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 7050, loss[loss=0.07413, simple_loss=0.09772, pruned_loss=0.01369, audio_tagging_loss=0.01158, over 15016.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.08955, pruned_loss=0.01243, audio_tagging_loss=0.009011, over 3037361.10 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:17:35,249 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3173253.3333333335, ans=0.125 2023-11-26 02:17:35,403 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3173253.3333333335, ans=0.125 2023-11-26 02:17:44,238 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 476000 2023-11-26 02:17:45,532 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-476000.pt 2023-11-26 02:18:05,306 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.51 vs. limit=15.0 2023-11-26 02:18:08,093 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.940e+01 8.411e+01 9.041e+01 9.968e+01 1.223e+02, threshold=1.808e+02, percent-clipped=0.0 2023-11-26 02:18:14,561 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3173453.3333333335, ans=0.1 2023-11-26 02:18:14,646 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=3173453.3333333335, ans=0.025 2023-11-26 02:18:19,682 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 7100, loss[loss=0.05333, simple_loss=0.07049, pruned_loss=0.007367, audio_tagging_loss=0.01072, over 15495.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.08932, pruned_loss=0.01239, audio_tagging_loss=0.009012, over 3037747.62 frames. ], batch size: 60, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:18:40,526 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3173586.6666666665, ans=0.0 2023-11-26 02:18:42,519 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 476050 2023-11-26 02:18:50,499 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 02:18:50,545 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3173653.3333333335, ans=0.125 2023-11-26 02:18:53,752 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 02:18:55,717 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3173720.0, ans=0.0 2023-11-26 02:19:15,655 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 7150, loss[loss=0.07451, simple_loss=0.1007, pruned_loss=0.01566, audio_tagging_loss=0.008519, over 15918.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.08886, pruned_loss=0.01228, audio_tagging_loss=0.009134, over 3037393.06 frames. ], batch size: 61, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:19:20,115 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3173853.3333333335, ans=0.125 2023-11-26 02:19:23,716 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3173853.3333333335, ans=0.0 2023-11-26 02:19:31,816 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3173920.0, ans=0.0 2023-11-26 02:19:37,940 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 476100 2023-11-26 02:19:48,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3174053.3333333335, ans=0.0 2023-11-26 02:19:52,178 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3174053.3333333335, ans=0.0 2023-11-26 02:19:58,943 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.217e+01 8.796e+01 9.304e+01 9.946e+01 1.523e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-26 02:20:10,803 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3174186.6666666665, ans=0.0 2023-11-26 02:20:11,673 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 7200, loss[loss=0.05959, simple_loss=0.07718, pruned_loss=0.01024, audio_tagging_loss=0.01076, over 16043.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.08887, pruned_loss=0.01215, audio_tagging_loss=0.00919, over 3041211.78 frames. ], batch size: 62, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:20:15,119 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3174186.6666666665, ans=0.125 2023-11-26 02:20:18,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3174186.6666666665, ans=0.1 2023-11-26 02:20:33,552 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 476150 2023-11-26 02:20:36,889 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3174320.0, ans=0.125 2023-11-26 02:21:00,580 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3174453.3333333335, ans=0.125 2023-11-26 02:21:06,650 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 7250, loss[loss=0.04927, simple_loss=0.06206, pruned_loss=0.006925, audio_tagging_loss=0.01132, over 16062.00 frames. ], tot_loss[loss=0.06645, simple_loss=0.08962, pruned_loss=0.01233, audio_tagging_loss=0.009309, over 3037473.12 frames. ], batch size: 62, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:21:26,133 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3174586.6666666665, ans=0.125 2023-11-26 02:21:29,712 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 476200 2023-11-26 02:21:31,155 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3174653.3333333335, ans=0.0 2023-11-26 02:21:38,086 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3174653.3333333335, ans=0.125 2023-11-26 02:21:50,265 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.26 vs. limit=15.0 2023-11-26 02:21:51,695 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.120e+01 8.421e+01 9.196e+01 9.750e+01 1.203e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-26 02:21:57,276 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3174786.6666666665, ans=0.0 2023-11-26 02:22:02,907 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 7300, loss[loss=0.06792, simple_loss=0.08563, pruned_loss=0.01516, audio_tagging_loss=0.009944, over 15179.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.0901, pruned_loss=0.01241, audio_tagging_loss=0.009113, over 3045684.38 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:22:04,815 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=3174853.3333333335, ans=0.95 2023-11-26 02:22:08,939 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3174853.3333333335, ans=0.0 2023-11-26 02:22:20,508 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3174920.0, ans=0.125 2023-11-26 02:22:25,748 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 476250 2023-11-26 02:22:27,930 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3174986.6666666665, ans=0.125 2023-11-26 02:22:58,974 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 7350, loss[loss=0.06109, simple_loss=0.08649, pruned_loss=0.009474, audio_tagging_loss=0.008369, over 15145.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.08981, pruned_loss=0.01248, audio_tagging_loss=0.00904, over 3047425.19 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 02:23:00,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3175186.6666666665, ans=0.0 2023-11-26 02:23:10,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3175253.3333333335, ans=0.1 2023-11-26 02:23:13,231 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3175253.3333333335, ans=0.125 2023-11-26 02:23:14,291 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3175253.3333333335, ans=0.04949747468305833 2023-11-26 02:23:15,388 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3175253.3333333335, ans=0.125 2023-11-26 02:23:20,504 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 476300 2023-11-26 02:23:27,658 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3175320.0, ans=0.2 2023-11-26 02:23:44,883 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.015e+01 8.628e+01 9.246e+01 9.778e+01 1.248e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-26 02:23:54,428 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 7400, loss[loss=0.05715, simple_loss=0.07563, pruned_loss=0.01016, audio_tagging_loss=0.009164, over 14443.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08838, pruned_loss=0.0123, audio_tagging_loss=0.008979, over 3038171.50 frames. ], batch size: 53, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 02:24:06,299 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3175586.6666666665, ans=0.1 2023-11-26 02:24:16,196 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 476350 2023-11-26 02:24:21,408 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.62 vs. limit=22.5 2023-11-26 02:24:25,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3175653.3333333335, ans=0.0 2023-11-26 02:24:49,166 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 7450, loss[loss=0.07179, simple_loss=0.09083, pruned_loss=0.0168, audio_tagging_loss=0.009576, over 14416.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08853, pruned_loss=0.01248, audio_tagging_loss=0.008839, over 3036379.87 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 02:24:51,489 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3175853.3333333335, ans=0.1 2023-11-26 02:25:12,743 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 476400 2023-11-26 02:25:33,550 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.91 vs. limit=15.0 2023-11-26 02:25:35,539 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.103e+01 8.524e+01 9.234e+01 9.933e+01 1.379e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-26 02:25:38,430 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3176120.0, ans=0.5 2023-11-26 02:25:46,036 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 7500, loss[loss=0.07103, simple_loss=0.09375, pruned_loss=0.01504, audio_tagging_loss=0.009111, over 15237.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.08965, pruned_loss=0.0126, audio_tagging_loss=0.008806, over 3046524.24 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 02:25:56,474 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3176253.3333333335, ans=0.125 2023-11-26 02:26:02,217 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.64 vs. limit=15.0 2023-11-26 02:26:07,822 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 476450 2023-11-26 02:26:12,559 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.95 vs. limit=15.0 2023-11-26 02:26:24,997 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3176386.6666666665, ans=0.0 2023-11-26 02:26:29,559 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3176453.3333333335, ans=0.0 2023-11-26 02:26:41,538 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 7550, loss[loss=0.06678, simple_loss=0.08753, pruned_loss=0.01451, audio_tagging_loss=0.00851, over 16076.00 frames. ], tot_loss[loss=0.0668, simple_loss=0.09047, pruned_loss=0.01283, audio_tagging_loss=0.00873, over 3050155.87 frames. ], batch size: 60, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 02:26:45,004 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3176520.0, ans=0.0 2023-11-26 02:26:50,151 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3176520.0, ans=0.0 2023-11-26 02:26:53,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3176586.6666666665, ans=0.2 2023-11-26 02:26:57,542 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 02:26:58,563 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3176586.6666666665, ans=0.0 2023-11-26 02:27:03,131 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 476500 2023-11-26 02:27:21,692 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3176720.0, ans=0.2 2023-11-26 02:27:26,803 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.676e+01 8.615e+01 8.990e+01 9.647e+01 1.278e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-26 02:27:26,993 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3176786.6666666665, ans=0.2 2023-11-26 02:27:36,402 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 7600, loss[loss=0.05764, simple_loss=0.07568, pruned_loss=0.009925, audio_tagging_loss=0.009876, over 14470.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.08899, pruned_loss=0.0125, audio_tagging_loss=0.008817, over 3040489.63 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:27:56,650 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3176920.0, ans=0.125 2023-11-26 02:27:59,756 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 476550 2023-11-26 02:28:07,285 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3176986.6666666665, ans=0.125 2023-11-26 02:28:17,132 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.18 vs. limit=15.0 2023-11-26 02:28:20,953 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3177120.0, ans=0.1 2023-11-26 02:28:31,837 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 7650, loss[loss=0.0753, simple_loss=0.1011, pruned_loss=0.01464, audio_tagging_loss=0.01013, over 15378.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08877, pruned_loss=0.01251, audio_tagging_loss=0.008743, over 3039597.53 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:28:32,661 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3177186.6666666665, ans=0.125 2023-11-26 02:28:54,255 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 476600 2023-11-26 02:29:03,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3177320.0, ans=0.0 2023-11-26 02:29:16,405 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 02:29:18,277 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.997e+01 8.582e+01 9.244e+01 1.012e+02 1.285e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-26 02:29:18,567 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3177453.3333333335, ans=0.125 2023-11-26 02:29:28,320 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 7700, loss[loss=0.06084, simple_loss=0.07443, pruned_loss=0.01342, audio_tagging_loss=0.0102, over 14038.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.08885, pruned_loss=0.01249, audio_tagging_loss=0.00879, over 3034842.83 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:29:41,133 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3177586.6666666665, ans=0.0 2023-11-26 02:29:42,179 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3177586.6666666665, ans=0.125 2023-11-26 02:29:47,554 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3177586.6666666665, ans=0.125 2023-11-26 02:29:50,126 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 476650 2023-11-26 02:29:52,432 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3177653.3333333335, ans=0.125 2023-11-26 02:29:52,510 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3177653.3333333335, ans=0.2 2023-11-26 02:29:52,916 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.90 vs. limit=10.0 2023-11-26 02:30:23,203 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 7750, loss[loss=0.0595, simple_loss=0.08133, pruned_loss=0.01036, audio_tagging_loss=0.008471, over 14736.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.08857, pruned_loss=0.01247, audio_tagging_loss=0.008941, over 3033783.05 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:30:43,907 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 02:30:45,806 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 476700 2023-11-26 02:30:55,942 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.15 vs. limit=12.0 2023-11-26 02:30:58,721 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3178053.3333333335, ans=0.2 2023-11-26 02:31:07,836 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.997e+01 8.684e+01 9.280e+01 1.001e+02 1.211e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-26 02:31:17,262 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.01 vs. limit=15.0 2023-11-26 02:31:17,894 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 7800, loss[loss=0.06712, simple_loss=0.08936, pruned_loss=0.01224, audio_tagging_loss=0.0102, over 14920.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.08916, pruned_loss=0.01245, audio_tagging_loss=0.008848, over 3040186.57 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:31:28,153 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3178186.6666666665, ans=0.125 2023-11-26 02:31:32,681 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.66 vs. limit=22.5 2023-11-26 02:31:40,633 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 476750 2023-11-26 02:31:40,845 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3178320.0, ans=0.125 2023-11-26 02:31:45,443 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=3178320.0, ans=10.0 2023-11-26 02:31:46,113 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3178320.0, ans=0.0 2023-11-26 02:31:55,019 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3178386.6666666665, ans=0.0 2023-11-26 02:31:58,304 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3178386.6666666665, ans=0.025 2023-11-26 02:32:12,439 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3178453.3333333335, ans=0.0 2023-11-26 02:32:14,219 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 7850, loss[loss=0.05808, simple_loss=0.08409, pruned_loss=0.00875, audio_tagging_loss=0.007282, over 15232.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.08927, pruned_loss=0.01237, audio_tagging_loss=0.00897, over 3036768.97 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:32:32,231 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3178586.6666666665, ans=0.2 2023-11-26 02:32:35,295 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 476800 2023-11-26 02:32:51,594 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.48 vs. limit=10.0 2023-11-26 02:32:59,766 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.148e+01 8.601e+01 9.556e+01 1.008e+02 1.371e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-26 02:33:05,232 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3178786.6666666665, ans=0.0 2023-11-26 02:33:09,242 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 7900, loss[loss=0.06224, simple_loss=0.08013, pruned_loss=0.013, audio_tagging_loss=0.009178, over 15963.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.08923, pruned_loss=0.0123, audio_tagging_loss=0.009023, over 3042465.36 frames. ], batch size: 61, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:33:09,429 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3178853.3333333335, ans=0.0 2023-11-26 02:33:17,847 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3178853.3333333335, ans=0.0 2023-11-26 02:33:22,016 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.89 vs. limit=15.0 2023-11-26 02:33:27,895 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3178920.0, ans=0.125 2023-11-26 02:33:31,535 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 476850 2023-11-26 02:33:40,075 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3178986.6666666665, ans=0.125 2023-11-26 02:33:46,411 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3179053.3333333335, ans=0.125 2023-11-26 02:34:04,792 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 7950, loss[loss=0.0552, simple_loss=0.06905, pruned_loss=0.01054, audio_tagging_loss=0.01013, over 15341.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.08988, pruned_loss=0.01242, audio_tagging_loss=0.009006, over 3046137.33 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:34:06,434 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.86 vs. limit=10.0 2023-11-26 02:34:19,496 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 02:34:22,279 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3179253.3333333335, ans=0.125 2023-11-26 02:34:22,327 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3179253.3333333335, ans=0.2 2023-11-26 02:34:23,980 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.08 vs. limit=6.0 2023-11-26 02:34:25,573 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3179253.3333333335, ans=0.0 2023-11-26 02:34:26,466 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3179320.0, ans=0.125 2023-11-26 02:34:27,438 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 476900 2023-11-26 02:34:36,556 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.76 vs. limit=12.0 2023-11-26 02:34:39,263 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3179386.6666666665, ans=0.1 2023-11-26 02:34:48,275 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3179453.3333333335, ans=0.0 2023-11-26 02:34:50,081 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.398e+01 8.906e+01 9.549e+01 1.026e+02 1.284e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-26 02:34:50,330 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3179453.3333333335, ans=0.0 2023-11-26 02:35:00,599 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 8000, loss[loss=0.08781, simple_loss=0.129, pruned_loss=0.01517, audio_tagging_loss=0.00817, over 16096.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.09024, pruned_loss=0.0126, audio_tagging_loss=0.00914, over 3043006.18 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:35:00,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3179520.0, ans=0.125 2023-11-26 02:35:22,358 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 476950 2023-11-26 02:35:27,252 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3179653.3333333335, ans=0.1 2023-11-26 02:35:29,420 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3179653.3333333335, ans=0.125 2023-11-26 02:35:56,009 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 8050, loss[loss=0.08448, simple_loss=0.1216, pruned_loss=0.01596, audio_tagging_loss=0.007748, over 16423.00 frames. ], tot_loss[loss=0.06708, simple_loss=0.09087, pruned_loss=0.01264, audio_tagging_loss=0.008999, over 3043221.15 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:36:00,564 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3179853.3333333335, ans=0.0 2023-11-26 02:36:11,607 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3179920.0, ans=0.125 2023-11-26 02:36:18,277 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 477000 2023-11-26 02:36:23,521 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3179986.6666666665, ans=0.1 2023-11-26 02:36:23,526 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3179986.6666666665, ans=0.2 2023-11-26 02:36:36,917 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3180053.3333333335, ans=0.2 2023-11-26 02:36:41,008 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3180120.0, ans=0.0 2023-11-26 02:36:41,817 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.105e+01 8.827e+01 9.529e+01 1.030e+02 1.385e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-26 02:36:51,849 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 8100, loss[loss=0.07049, simple_loss=0.1038, pruned_loss=0.01153, audio_tagging_loss=0.007066, over 15440.00 frames. ], tot_loss[loss=0.06744, simple_loss=0.09125, pruned_loss=0.01286, audio_tagging_loss=0.008957, over 3046312.34 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:37:01,053 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3180186.6666666665, ans=0.2 2023-11-26 02:37:14,060 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 477050 2023-11-26 02:37:14,201 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3180320.0, ans=0.125 2023-11-26 02:37:47,791 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 8150, loss[loss=0.06288, simple_loss=0.08232, pruned_loss=0.01013, audio_tagging_loss=0.01159, over 15662.00 frames. ], tot_loss[loss=0.06735, simple_loss=0.09118, pruned_loss=0.01292, audio_tagging_loss=0.008837, over 3044601.92 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:38:04,279 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3180586.6666666665, ans=0.2 2023-11-26 02:38:09,404 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 477100 2023-11-26 02:38:33,157 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.717e+01 8.566e+01 9.339e+01 1.007e+02 1.243e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-26 02:38:43,161 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 8200, loss[loss=0.06484, simple_loss=0.09091, pruned_loss=0.01118, audio_tagging_loss=0.008214, over 14255.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.09036, pruned_loss=0.01255, audio_tagging_loss=0.008749, over 3046098.38 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:38:45,260 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 02:38:54,832 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3180920.0, ans=0.125 2023-11-26 02:39:00,594 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3180920.0, ans=0.125 2023-11-26 02:39:04,156 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.34 vs. limit=15.0 2023-11-26 02:39:04,723 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 477150 2023-11-26 02:39:32,935 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3181120.0, ans=0.125 2023-11-26 02:39:34,371 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.48 vs. limit=22.5 2023-11-26 02:39:37,204 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3181186.6666666665, ans=0.125 2023-11-26 02:39:38,075 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 8250, loss[loss=0.07151, simple_loss=0.09716, pruned_loss=0.01455, audio_tagging_loss=0.008378, over 16840.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.09013, pruned_loss=0.01248, audio_tagging_loss=0.008759, over 3056660.19 frames. ], batch size: 61, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:39:48,555 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.14 vs. limit=15.0 2023-11-26 02:39:58,755 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3181253.3333333335, ans=0.0 2023-11-26 02:40:00,629 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 477200 2023-11-26 02:40:18,733 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.62 vs. limit=22.5 2023-11-26 02:40:23,954 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.842e+01 8.407e+01 9.078e+01 9.588e+01 1.625e+02, threshold=1.816e+02, percent-clipped=0.0 2023-11-26 02:40:26,782 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.85 vs. limit=6.0 2023-11-26 02:40:33,985 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 8300, loss[loss=0.07081, simple_loss=0.1009, pruned_loss=0.01255, audio_tagging_loss=0.007792, over 15122.00 frames. ], tot_loss[loss=0.06646, simple_loss=0.09042, pruned_loss=0.01252, audio_tagging_loss=0.008735, over 3056382.40 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:40:49,544 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3181586.6666666665, ans=0.2 2023-11-26 02:40:52,717 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.34 vs. limit=15.0 2023-11-26 02:40:56,322 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 477250 2023-11-26 02:41:11,948 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3181720.0, ans=0.2 2023-11-26 02:41:22,071 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.96 vs. limit=6.0 2023-11-26 02:41:29,717 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 8350, loss[loss=0.04877, simple_loss=0.06784, pruned_loss=0.006267, audio_tagging_loss=0.008585, over 15017.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.09015, pruned_loss=0.01248, audio_tagging_loss=0.008689, over 3057889.58 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:41:41,051 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3181920.0, ans=0.0 2023-11-26 02:41:52,075 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 477300 2023-11-26 02:41:58,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3181986.6666666665, ans=0.125 2023-11-26 02:41:59,557 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.45 vs. limit=22.5 2023-11-26 02:42:02,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3182053.3333333335, ans=0.125 2023-11-26 02:42:16,198 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.210e+01 8.717e+01 9.531e+01 1.032e+02 1.340e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-26 02:42:20,407 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.19 vs. limit=22.5 2023-11-26 02:42:23,322 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3182120.0, ans=0.0 2023-11-26 02:42:25,229 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 8400, loss[loss=0.04636, simple_loss=0.0664, pruned_loss=0.007684, audio_tagging_loss=0.005477, over 15031.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.08998, pruned_loss=0.01241, audio_tagging_loss=0.008635, over 3057360.26 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:42:31,310 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3182186.6666666665, ans=0.2 2023-11-26 02:42:35,376 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3182253.3333333335, ans=0.0 2023-11-26 02:42:41,150 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3182253.3333333335, ans=0.1 2023-11-26 02:42:47,853 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 477350 2023-11-26 02:42:49,116 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 02:43:14,244 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_na.min_abs, batch_count=3182453.3333333335, ans=0.02 2023-11-26 02:43:16,742 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.99 vs. limit=22.5 2023-11-26 02:43:20,978 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 8450, loss[loss=0.07067, simple_loss=0.08163, pruned_loss=0.01703, audio_tagging_loss=0.01283, over 14809.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.09012, pruned_loss=0.01248, audio_tagging_loss=0.008638, over 3051277.90 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:43:42,524 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 477400 2023-11-26 02:43:51,290 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.92 vs. limit=15.0 2023-11-26 02:43:59,126 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.69 vs. limit=10.0 2023-11-26 02:43:59,855 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3182720.0, ans=0.125 2023-11-26 02:44:03,562 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3182720.0, ans=0.125 2023-11-26 02:44:09,233 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.467e+01 8.762e+01 9.219e+01 9.767e+01 1.234e+02, threshold=1.844e+02, percent-clipped=0.0 2023-11-26 02:44:16,647 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 8500, loss[loss=0.06379, simple_loss=0.08979, pruned_loss=0.01308, audio_tagging_loss=0.005807, over 15443.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.08988, pruned_loss=0.01252, audio_tagging_loss=0.008718, over 3053850.25 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:44:38,099 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3182986.6666666665, ans=0.125 2023-11-26 02:44:39,004 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 477450 2023-11-26 02:44:43,337 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3182986.6666666665, ans=0.1 2023-11-26 02:44:51,763 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3183053.3333333335, ans=0.2 2023-11-26 02:45:05,531 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3183120.0, ans=0.125 2023-11-26 02:45:10,671 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3183186.6666666665, ans=0.125 2023-11-26 02:45:11,607 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 8550, loss[loss=0.04998, simple_loss=0.06285, pruned_loss=0.00924, audio_tagging_loss=0.009314, over 15369.00 frames. ], tot_loss[loss=0.06596, simple_loss=0.08938, pruned_loss=0.01249, audio_tagging_loss=0.008777, over 3048860.86 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:45:25,520 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3183253.3333333335, ans=0.125 2023-11-26 02:45:28,044 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3183253.3333333335, ans=0.125 2023-11-26 02:45:34,885 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 477500 2023-11-26 02:45:39,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3183320.0, ans=0.125 2023-11-26 02:45:41,353 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3183320.0, ans=0.125 2023-11-26 02:45:51,877 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3183386.6666666665, ans=0.125 2023-11-26 02:45:59,536 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.757e+01 8.847e+01 9.604e+01 1.040e+02 3.358e+02, threshold=1.921e+02, percent-clipped=1.0 2023-11-26 02:46:07,456 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 8600, loss[loss=0.07966, simple_loss=0.1153, pruned_loss=0.01513, audio_tagging_loss=0.006902, over 15885.00 frames. ], tot_loss[loss=0.06684, simple_loss=0.09083, pruned_loss=0.01264, audio_tagging_loss=0.008782, over 3048235.46 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:46:29,597 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 477550 2023-11-26 02:46:29,709 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3183653.3333333335, ans=0.2 2023-11-26 02:46:44,644 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3183720.0, ans=0.125 2023-11-26 02:46:45,669 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3183720.0, ans=0.1 2023-11-26 02:47:03,346 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 8650, loss[loss=0.06759, simple_loss=0.0957, pruned_loss=0.01187, audio_tagging_loss=0.007864, over 14999.00 frames. ], tot_loss[loss=0.06718, simple_loss=0.09137, pruned_loss=0.0127, audio_tagging_loss=0.008789, over 3045662.01 frames. ], batch size: 54, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:47:17,283 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3183920.0, ans=0.125 2023-11-26 02:47:20,344 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3183920.0, ans=0.125 2023-11-26 02:47:21,900 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.60 vs. limit=15.0 2023-11-26 02:47:24,868 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 477600 2023-11-26 02:47:24,949 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3183986.6666666665, ans=0.0 2023-11-26 02:47:25,049 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-26 02:47:26,459 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3183986.6666666665, ans=0.125 2023-11-26 02:47:28,539 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3183986.6666666665, ans=0.125 2023-11-26 02:47:50,491 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.395e+01 8.714e+01 9.384e+01 1.008e+02 1.495e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-26 02:47:51,151 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.22 vs. limit=15.0 2023-11-26 02:47:54,819 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3184120.0, ans=0.125 2023-11-26 02:47:57,879 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 8700, loss[loss=0.0846, simple_loss=0.1223, pruned_loss=0.01613, audio_tagging_loss=0.007328, over 15546.00 frames. ], tot_loss[loss=0.06738, simple_loss=0.09118, pruned_loss=0.01286, audio_tagging_loss=0.008929, over 3042017.27 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:47:59,126 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3184186.6666666665, ans=0.04949747468305833 2023-11-26 02:48:03,358 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3184186.6666666665, ans=0.0 2023-11-26 02:48:21,305 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 477650 2023-11-26 02:48:25,595 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3184320.0, ans=0.125 2023-11-26 02:48:31,368 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.85 vs. limit=22.5 2023-11-26 02:48:53,640 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.10 vs. limit=10.0 2023-11-26 02:48:54,006 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 8750, loss[loss=0.06115, simple_loss=0.0754, pruned_loss=0.01248, audio_tagging_loss=0.01097, over 14818.00 frames. ], tot_loss[loss=0.06754, simple_loss=0.09149, pruned_loss=0.01279, audio_tagging_loss=0.008997, over 3042058.35 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:48:54,711 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.54 vs. limit=22.5 2023-11-26 02:49:13,107 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3184586.6666666665, ans=0.125 2023-11-26 02:49:16,031 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 477700 2023-11-26 02:49:18,272 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3184653.3333333335, ans=0.125 2023-11-26 02:49:21,366 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3184653.3333333335, ans=0.0 2023-11-26 02:49:29,857 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3184720.0, ans=0.0 2023-11-26 02:49:36,529 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3184720.0, ans=0.125 2023-11-26 02:49:41,954 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.014e+01 8.883e+01 9.536e+01 1.029e+02 1.726e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-26 02:49:43,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3184786.6666666665, ans=0.125 2023-11-26 02:49:46,913 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3184786.6666666665, ans=0.2 2023-11-26 02:49:49,891 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 8800, loss[loss=0.08182, simple_loss=0.1113, pruned_loss=0.01797, audio_tagging_loss=0.00818, over 15291.00 frames. ], tot_loss[loss=0.06787, simple_loss=0.09175, pruned_loss=0.01293, audio_tagging_loss=0.009075, over 3044838.43 frames. ], batch size: 54, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:49:50,042 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3184853.3333333335, ans=0.0 2023-11-26 02:50:02,103 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.21 vs. limit=10.0 2023-11-26 02:50:11,220 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 477750 2023-11-26 02:50:12,499 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3184986.6666666665, ans=0.1 2023-11-26 02:50:22,391 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.73 vs. limit=22.5 2023-11-26 02:50:25,747 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3185053.3333333335, ans=0.0 2023-11-26 02:50:44,759 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 8850, loss[loss=0.07463, simple_loss=0.09569, pruned_loss=0.01937, audio_tagging_loss=0.007417, over 14006.00 frames. ], tot_loss[loss=0.06818, simple_loss=0.09214, pruned_loss=0.01306, audio_tagging_loss=0.009055, over 3045075.98 frames. ], batch size: 53, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:50:56,853 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 02:51:01,205 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3185253.3333333335, ans=0.125 2023-11-26 02:51:02,231 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3185253.3333333335, ans=0.0 2023-11-26 02:51:06,947 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 477800 2023-11-26 02:51:11,107 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3185320.0, ans=0.125 2023-11-26 02:51:14,138 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3185320.0, ans=0.2 2023-11-26 02:51:31,907 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.135e+01 8.669e+01 9.316e+01 9.941e+01 1.332e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-26 02:51:39,846 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 8900, loss[loss=0.07713, simple_loss=0.1079, pruned_loss=0.01418, audio_tagging_loss=0.008998, over 15500.00 frames. ], tot_loss[loss=0.06799, simple_loss=0.0919, pruned_loss=0.01306, audio_tagging_loss=0.008985, over 3041220.86 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:51:48,981 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.59 vs. limit=10.0 2023-11-26 02:51:50,556 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3185586.6666666665, ans=0.125 2023-11-26 02:51:50,688 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3185586.6666666665, ans=0.0 2023-11-26 02:51:58,398 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3185586.6666666665, ans=0.125 2023-11-26 02:52:02,495 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 477850 2023-11-26 02:52:11,395 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.19 vs. limit=12.0 2023-11-26 02:52:35,571 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3185853.3333333335, ans=0.0 2023-11-26 02:52:36,334 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 8950, loss[loss=0.0634, simple_loss=0.09248, pruned_loss=0.01084, audio_tagging_loss=0.006323, over 15213.00 frames. ], tot_loss[loss=0.06813, simple_loss=0.09258, pruned_loss=0.01308, audio_tagging_loss=0.008755, over 3053940.50 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:52:36,575 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3185853.3333333335, ans=0.125 2023-11-26 02:52:44,945 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3185853.3333333335, ans=0.1 2023-11-26 02:52:49,093 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3185920.0, ans=0.125 2023-11-26 02:52:55,429 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3185920.0, ans=0.125 2023-11-26 02:52:57,352 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 477900 2023-11-26 02:53:17,864 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3186053.3333333335, ans=0.0 2023-11-26 02:53:20,110 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.09 vs. limit=15.0 2023-11-26 02:53:23,904 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.727e+01 8.837e+01 9.264e+01 1.015e+02 1.330e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-26 02:53:31,284 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 9000, loss[loss=0.07661, simple_loss=0.1031, pruned_loss=0.01742, audio_tagging_loss=0.007656, over 15935.00 frames. ], tot_loss[loss=0.06779, simple_loss=0.09223, pruned_loss=0.01294, audio_tagging_loss=0.00873, over 3052968.84 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:53:31,286 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-26 02:54:03,243 INFO [train_asr.py:1267] (0/4) Epoch 40, validation: loss=0.05846, simple_loss=0.05059, pruned_loss=0.005121, audio_tagging_loss=0.02804, over 4681554.00 frames. 2023-11-26 02:54:03,244 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-26 02:54:25,304 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 477950 2023-11-26 02:54:25,678 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.20 vs. limit=22.5 2023-11-26 02:54:40,766 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.70 vs. limit=15.0 2023-11-26 02:54:58,694 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3186520.0, ans=0.1 2023-11-26 02:54:59,519 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 9050, loss[loss=0.06473, simple_loss=0.08423, pruned_loss=0.01313, audio_tagging_loss=0.009483, over 15561.00 frames. ], tot_loss[loss=0.06809, simple_loss=0.09243, pruned_loss=0.01314, audio_tagging_loss=0.008738, over 3051966.73 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:55:03,401 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.92 vs. limit=12.0 2023-11-26 02:55:06,160 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3186520.0, ans=0.125 2023-11-26 02:55:20,701 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 478000 2023-11-26 02:55:48,459 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.924e+01 8.738e+01 9.239e+01 1.027e+02 1.338e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-26 02:55:50,926 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3186786.6666666665, ans=0.0 2023-11-26 02:55:54,936 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 9100, loss[loss=0.05598, simple_loss=0.07394, pruned_loss=0.01058, audio_tagging_loss=0.008431, over 15315.00 frames. ], tot_loss[loss=0.06738, simple_loss=0.09174, pruned_loss=0.01285, audio_tagging_loss=0.008665, over 3049791.38 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:55:56,267 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3186853.3333333335, ans=0.125 2023-11-26 02:56:04,196 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.23 vs. limit=6.0 2023-11-26 02:56:17,385 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 478050 2023-11-26 02:56:29,347 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.88 vs. limit=15.0 2023-11-26 02:56:39,724 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3187120.0, ans=0.125 2023-11-26 02:56:41,881 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3187120.0, ans=0.125 2023-11-26 02:56:47,429 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=3187120.0, ans=15.0 2023-11-26 02:56:50,661 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 9150, loss[loss=0.05986, simple_loss=0.08428, pruned_loss=0.009315, audio_tagging_loss=0.008399, over 14777.00 frames. ], tot_loss[loss=0.06727, simple_loss=0.09132, pruned_loss=0.01287, audio_tagging_loss=0.008739, over 3048297.49 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 02:57:09,378 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3187253.3333333335, ans=0.1 2023-11-26 02:57:13,560 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 478100 2023-11-26 02:57:37,740 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.21 vs. limit=12.0 2023-11-26 02:57:38,467 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3187453.3333333335, ans=0.0 2023-11-26 02:57:39,820 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.230e+01 8.630e+01 9.008e+01 9.650e+01 1.509e+02, threshold=1.802e+02, percent-clipped=0.0 2023-11-26 02:57:46,738 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 9200, loss[loss=0.04734, simple_loss=0.0661, pruned_loss=0.007239, audio_tagging_loss=0.007048, over 15480.00 frames. ], tot_loss[loss=0.06709, simple_loss=0.09129, pruned_loss=0.01273, audio_tagging_loss=0.008718, over 3050895.25 frames. ], batch size: 60, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 02:57:46,950 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3187520.0, ans=0.125 2023-11-26 02:57:58,339 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3187586.6666666665, ans=0.125 2023-11-26 02:58:08,660 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 478150 2023-11-26 02:58:24,993 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3187720.0, ans=0.05 2023-11-26 02:58:32,830 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3187786.6666666665, ans=0.1 2023-11-26 02:58:37,622 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3187786.6666666665, ans=10.0 2023-11-26 02:58:42,643 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 9250, loss[loss=0.06671, simple_loss=0.09234, pruned_loss=0.0126, audio_tagging_loss=0.007943, over 15693.00 frames. ], tot_loss[loss=0.06722, simple_loss=0.09115, pruned_loss=0.01285, audio_tagging_loss=0.00879, over 3057968.75 frames. ], batch size: 59, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 02:59:01,562 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3187920.0, ans=0.1 2023-11-26 02:59:04,969 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 478200 2023-11-26 02:59:06,578 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 02:59:31,492 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.730e+01 8.687e+01 9.467e+01 1.008e+02 1.287e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-26 02:59:37,198 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3188186.6666666665, ans=0.2 2023-11-26 02:59:38,554 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 9300, loss[loss=0.06345, simple_loss=0.08215, pruned_loss=0.01217, audio_tagging_loss=0.01021, over 14584.00 frames. ], tot_loss[loss=0.06715, simple_loss=0.09114, pruned_loss=0.01275, audio_tagging_loss=0.008832, over 3064626.46 frames. ], batch size: 55, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 02:59:44,983 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3188186.6666666665, ans=0.125 2023-11-26 02:59:48,045 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.35 vs. limit=15.0 2023-11-26 02:59:51,013 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3188253.3333333335, ans=0.125 2023-11-26 03:00:00,885 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 478250 2023-11-26 03:00:09,549 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.93 vs. limit=15.0 2023-11-26 03:00:11,174 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3188386.6666666665, ans=0.125 2023-11-26 03:00:25,537 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3188453.3333333335, ans=0.0 2023-11-26 03:00:34,956 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 9350, loss[loss=0.0458, simple_loss=0.05404, pruned_loss=0.00617, audio_tagging_loss=0.01261, over 16211.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.09027, pruned_loss=0.01248, audio_tagging_loss=0.008891, over 3059543.58 frames. ], batch size: 64, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:00:56,909 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 478300 2023-11-26 03:00:58,116 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3188653.3333333335, ans=0.05 2023-11-26 03:00:58,511 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.28 vs. limit=15.0 2023-11-26 03:01:06,140 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3188653.3333333335, ans=0.125 2023-11-26 03:01:17,783 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3188720.0, ans=0.1 2023-11-26 03:01:18,824 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3188786.6666666665, ans=0.1 2023-11-26 03:01:23,926 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.316e+01 8.528e+01 9.262e+01 1.004e+02 1.387e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-26 03:01:26,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3188786.6666666665, ans=0.015 2023-11-26 03:01:30,309 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 9400, loss[loss=0.08127, simple_loss=0.1048, pruned_loss=0.01861, audio_tagging_loss=0.01026, over 15150.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.08989, pruned_loss=0.01246, audio_tagging_loss=0.008961, over 3052600.09 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:01:50,753 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3188920.0, ans=0.125 2023-11-26 03:01:52,674 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 478350 2023-11-26 03:02:15,937 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.72 vs. limit=22.5 2023-11-26 03:02:24,989 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 03:02:26,071 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 9450, loss[loss=0.06454, simple_loss=0.08992, pruned_loss=0.01191, audio_tagging_loss=0.007679, over 14993.00 frames. ], tot_loss[loss=0.06703, simple_loss=0.09074, pruned_loss=0.01268, audio_tagging_loss=0.00898, over 3048117.22 frames. ], batch size: 58, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:02:34,267 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3189186.6666666665, ans=0.125 2023-11-26 03:02:41,245 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3189253.3333333335, ans=0.1 2023-11-26 03:02:45,962 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3189253.3333333335, ans=0.1 2023-11-26 03:02:49,008 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 478400 2023-11-26 03:02:49,575 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.83 vs. limit=15.0 2023-11-26 03:02:53,026 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.77 vs. limit=12.0 2023-11-26 03:03:09,117 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3189386.6666666665, ans=0.0 2023-11-26 03:03:16,855 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.310e+01 8.757e+01 9.306e+01 1.009e+02 1.204e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-26 03:03:22,606 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 9500, loss[loss=0.07251, simple_loss=0.1, pruned_loss=0.0141, audio_tagging_loss=0.008395, over 14687.00 frames. ], tot_loss[loss=0.06736, simple_loss=0.09126, pruned_loss=0.01275, audio_tagging_loss=0.008983, over 3051544.17 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:03:31,246 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.83 vs. limit=6.0 2023-11-26 03:03:41,292 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3189586.6666666665, ans=0.125 2023-11-26 03:03:45,024 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 478450 2023-11-26 03:03:47,303 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3189653.3333333335, ans=0.125 2023-11-26 03:03:48,502 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 03:03:48,957 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.93 vs. limit=15.0 2023-11-26 03:03:54,547 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.61 vs. limit=6.0 2023-11-26 03:03:57,643 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2023-11-26 03:04:15,395 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3189786.6666666665, ans=0.125 2023-11-26 03:04:16,323 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3189786.6666666665, ans=0.125 2023-11-26 03:04:18,263 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 9550, loss[loss=0.0665, simple_loss=0.08578, pruned_loss=0.01331, audio_tagging_loss=0.01029, over 14342.00 frames. ], tot_loss[loss=0.06821, simple_loss=0.09257, pruned_loss=0.01291, audio_tagging_loss=0.009013, over 3052991.29 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:04:21,649 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3189853.3333333335, ans=0.0 2023-11-26 03:04:22,952 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3189853.3333333335, ans=0.0 2023-11-26 03:04:40,887 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 478500 2023-11-26 03:04:54,794 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3190053.3333333335, ans=0.125 2023-11-26 03:05:03,243 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3190120.0, ans=0.125 2023-11-26 03:05:06,470 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3190120.0, ans=0.2 2023-11-26 03:05:08,238 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.252e+01 8.706e+01 9.165e+01 9.979e+01 1.359e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-26 03:05:14,052 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 9600, loss[loss=0.07974, simple_loss=0.1132, pruned_loss=0.01541, audio_tagging_loss=0.007731, over 15871.00 frames. ], tot_loss[loss=0.06781, simple_loss=0.09199, pruned_loss=0.01274, audio_tagging_loss=0.009076, over 3056607.13 frames. ], batch size: 58, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:05:37,114 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 478550 2023-11-26 03:05:44,960 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.89 vs. limit=15.0 2023-11-26 03:06:07,387 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.96 vs. limit=15.0 2023-11-26 03:06:10,228 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 9650, loss[loss=0.07634, simple_loss=0.1071, pruned_loss=0.01531, audio_tagging_loss=0.007477, over 16028.00 frames. ], tot_loss[loss=0.06729, simple_loss=0.09111, pruned_loss=0.01261, audio_tagging_loss=0.009131, over 3062228.47 frames. ], batch size: 63, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:06:10,302 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3190520.0, ans=0.125 2023-11-26 03:06:15,103 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3190520.0, ans=0.2 2023-11-26 03:06:16,130 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3190520.0, ans=0.1 2023-11-26 03:06:31,793 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 478600 2023-11-26 03:06:49,302 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3190720.0, ans=0.125 2023-11-26 03:07:00,682 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.911e+01 8.580e+01 9.247e+01 1.001e+02 1.210e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-26 03:07:05,202 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3190853.3333333335, ans=0.125 2023-11-26 03:07:06,034 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 9700, loss[loss=0.0782, simple_loss=0.1091, pruned_loss=0.01479, audio_tagging_loss=0.008884, over 14694.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.09014, pruned_loss=0.01244, audio_tagging_loss=0.009024, over 3060440.98 frames. ], batch size: 55, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:07:28,289 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 478650 2023-11-26 03:07:32,825 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.73 vs. limit=15.0 2023-11-26 03:07:46,636 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.78 vs. limit=15.0 2023-11-26 03:07:50,683 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3191120.0, ans=0.2 2023-11-26 03:07:52,731 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3191120.0, ans=0.125 2023-11-26 03:08:01,013 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 9750, loss[loss=0.07534, simple_loss=0.101, pruned_loss=0.01586, audio_tagging_loss=0.008978, over 15238.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.08983, pruned_loss=0.01242, audio_tagging_loss=0.00887, over 3055489.86 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:08:21,039 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3191253.3333333335, ans=0.0 2023-11-26 03:08:24,557 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 478700 2023-11-26 03:08:29,200 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.50 vs. limit=15.0 2023-11-26 03:08:29,949 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3191320.0, ans=0.2 2023-11-26 03:08:30,038 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3191320.0, ans=0.125 2023-11-26 03:08:48,619 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3191453.3333333335, ans=0.2 2023-11-26 03:08:52,037 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.291e+01 8.589e+01 9.136e+01 9.841e+01 1.317e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-26 03:08:54,474 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3191453.3333333335, ans=0.125 2023-11-26 03:08:56,500 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3191520.0, ans=0.125 2023-11-26 03:08:57,352 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 9800, loss[loss=0.06248, simple_loss=0.08816, pruned_loss=0.0119, audio_tagging_loss=0.006498, over 14853.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.09008, pruned_loss=0.01237, audio_tagging_loss=0.008708, over 3045261.12 frames. ], batch size: 54, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:09:03,396 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3191520.0, ans=0.125 2023-11-26 03:09:05,271 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.43 vs. limit=22.5 2023-11-26 03:09:19,734 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 478750 2023-11-26 03:09:45,331 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.09 vs. limit=10.0 2023-11-26 03:09:48,431 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 03:09:53,712 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 9850, loss[loss=0.07927, simple_loss=0.1053, pruned_loss=0.01772, audio_tagging_loss=0.008923, over 14910.00 frames. ], tot_loss[loss=0.06678, simple_loss=0.09093, pruned_loss=0.01269, audio_tagging_loss=0.008616, over 3047968.14 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:10:01,401 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3191853.3333333335, ans=0.0 2023-11-26 03:10:07,688 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3191920.0, ans=0.125 2023-11-26 03:10:15,535 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 478800 2023-11-26 03:10:15,626 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3191986.6666666665, ans=0.2 2023-11-26 03:10:36,675 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.66 vs. limit=15.0 2023-11-26 03:10:44,533 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.907e+01 8.656e+01 9.156e+01 9.841e+01 1.408e+02, threshold=1.831e+02, percent-clipped=0.0 2023-11-26 03:10:48,206 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2023-11-26 03:10:48,801 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 9900, loss[loss=0.06417, simple_loss=0.09038, pruned_loss=0.01009, audio_tagging_loss=0.008883, over 14741.00 frames. ], tot_loss[loss=0.0667, simple_loss=0.09071, pruned_loss=0.01264, audio_tagging_loss=0.008705, over 3046977.27 frames. ], batch size: 55, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:11:06,963 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3192253.3333333335, ans=0.125 2023-11-26 03:11:11,608 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 478850 2023-11-26 03:11:16,019 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3192320.0, ans=0.125 2023-11-26 03:11:19,033 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3192320.0, ans=0.125 2023-11-26 03:11:36,169 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 03:11:44,258 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 9950, loss[loss=0.0652, simple_loss=0.09068, pruned_loss=0.01204, audio_tagging_loss=0.007819, over 15379.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.09047, pruned_loss=0.01265, audio_tagging_loss=0.008771, over 3037736.32 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:11:49,344 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3192520.0, ans=0.125 2023-11-26 03:11:58,167 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3192586.6666666665, ans=0.1 2023-11-26 03:12:02,025 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.21 vs. limit=15.0 2023-11-26 03:12:06,726 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 478900 2023-11-26 03:12:24,720 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.01 vs. limit=15.0 2023-11-26 03:12:25,571 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3192720.0, ans=0.0 2023-11-26 03:12:36,544 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.281e+01 8.464e+01 9.280e+01 9.880e+01 1.249e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-26 03:12:40,794 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 10000, loss[loss=0.07274, simple_loss=0.09982, pruned_loss=0.017, audio_tagging_loss=0.005832, over 13599.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.09071, pruned_loss=0.01263, audio_tagging_loss=0.008709, over 3041129.15 frames. ], batch size: 54, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:12:43,135 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3192853.3333333335, ans=0.125 2023-11-26 03:12:50,030 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.10 vs. limit=15.0 2023-11-26 03:13:02,286 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 478950 2023-11-26 03:13:06,317 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3192986.6666666665, ans=10.0 2023-11-26 03:13:17,935 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3193053.3333333335, ans=0.125 2023-11-26 03:13:21,617 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 03:13:22,682 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3193053.3333333335, ans=0.1 2023-11-26 03:13:23,745 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3193053.3333333335, ans=0.125 2023-11-26 03:13:36,319 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 10050, loss[loss=0.08322, simple_loss=0.1158, pruned_loss=0.01722, audio_tagging_loss=0.008117, over 15003.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.08941, pruned_loss=0.01241, audio_tagging_loss=0.008863, over 3037461.57 frames. ], batch size: 54, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:13:43,056 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3193186.6666666665, ans=0.0 2023-11-26 03:13:55,106 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.02 vs. limit=15.0 2023-11-26 03:13:58,189 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.73 vs. limit=15.0 2023-11-26 03:13:58,679 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 479000 2023-11-26 03:14:26,607 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.210e+01 8.389e+01 9.159e+01 9.802e+01 1.976e+02, threshold=1.832e+02, percent-clipped=1.0 2023-11-26 03:14:31,505 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 10100, loss[loss=0.06857, simple_loss=0.09168, pruned_loss=0.01297, audio_tagging_loss=0.009757, over 14976.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.08872, pruned_loss=0.01237, audio_tagging_loss=0.008977, over 3035796.43 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:14:39,766 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3193520.0, ans=0.0 2023-11-26 03:14:42,795 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3193586.6666666665, ans=0.125 2023-11-26 03:14:53,765 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 479050 2023-11-26 03:15:13,941 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.26 vs. limit=6.0 2023-11-26 03:15:16,531 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 03:15:18,940 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3193786.6666666665, ans=0.0 2023-11-26 03:15:21,489 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.73 vs. limit=15.0 2023-11-26 03:15:27,575 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 10150, loss[loss=0.05816, simple_loss=0.07737, pruned_loss=0.009391, audio_tagging_loss=0.01008, over 15619.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.08936, pruned_loss=0.01257, audio_tagging_loss=0.008913, over 3038370.83 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:15:32,069 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3193853.3333333335, ans=0.07 2023-11-26 03:15:41,642 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3193920.0, ans=0.0 2023-11-26 03:15:42,743 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3193920.0, ans=0.2 2023-11-26 03:15:49,060 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 479100 2023-11-26 03:15:53,169 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 03:16:09,503 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.59 vs. limit=15.0 2023-11-26 03:16:14,599 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3194120.0, ans=0.2 2023-11-26 03:16:15,750 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3194120.0, ans=0.125 2023-11-26 03:16:17,809 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3194120.0, ans=0.125 2023-11-26 03:16:18,805 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3194120.0, ans=0.1 2023-11-26 03:16:19,620 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.429e+01 8.867e+01 9.466e+01 1.017e+02 1.236e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-26 03:16:22,874 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 10200, loss[loss=0.08933, simple_loss=0.1204, pruned_loss=0.02016, audio_tagging_loss=0.008961, over 14847.00 frames. ], tot_loss[loss=0.067, simple_loss=0.09051, pruned_loss=0.01281, audio_tagging_loss=0.00893, over 3046927.17 frames. ], batch size: 54, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:16:34,885 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3194253.3333333335, ans=0.125 2023-11-26 03:16:45,238 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 03:16:45,300 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 479150 2023-11-26 03:16:54,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3194320.0, ans=0.0 2023-11-26 03:17:03,925 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3194386.6666666665, ans=0.04949747468305833 2023-11-26 03:17:17,602 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 10250, loss[loss=0.06216, simple_loss=0.0781, pruned_loss=0.01425, audio_tagging_loss=0.00886, over 15318.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.08923, pruned_loss=0.01249, audio_tagging_loss=0.009131, over 3051366.51 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:17:33,699 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3194586.6666666665, ans=0.125 2023-11-26 03:17:41,079 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 479200 2023-11-26 03:17:47,028 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.78 vs. limit=15.0 2023-11-26 03:17:51,214 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3194720.0, ans=0.2 2023-11-26 03:17:54,799 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.70 vs. limit=15.0 2023-11-26 03:18:06,493 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3194786.6666666665, ans=0.2 2023-11-26 03:18:10,890 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.032e+01 8.528e+01 9.326e+01 1.007e+02 1.324e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-26 03:18:14,618 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 10300, loss[loss=0.0804, simple_loss=0.107, pruned_loss=0.01912, audio_tagging_loss=0.007787, over 15570.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.08924, pruned_loss=0.01267, audio_tagging_loss=0.009202, over 3053244.04 frames. ], batch size: 59, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:18:24,695 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3194920.0, ans=0.125 2023-11-26 03:18:32,268 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3194920.0, ans=0.125 2023-11-26 03:18:33,295 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3194920.0, ans=0.125 2023-11-26 03:18:34,390 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3194920.0, ans=0.125 2023-11-26 03:18:36,416 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 479250 2023-11-26 03:18:39,830 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3194986.6666666665, ans=0.125 2023-11-26 03:18:46,758 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3195053.3333333335, ans=0.125 2023-11-26 03:18:50,945 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3195053.3333333335, ans=0.125 2023-11-26 03:19:06,661 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3195120.0, ans=0.125 2023-11-26 03:19:10,599 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 10350, loss[loss=0.05289, simple_loss=0.06468, pruned_loss=0.005781, audio_tagging_loss=0.01477, over 15355.00 frames. ], tot_loss[loss=0.0666, simple_loss=0.08954, pruned_loss=0.01254, audio_tagging_loss=0.009284, over 3047381.09 frames. ], batch size: 61, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:19:16,305 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3195186.6666666665, ans=0.125 2023-11-26 03:19:28,335 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3195253.3333333335, ans=0.0 2023-11-26 03:19:32,378 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 479300 2023-11-26 03:19:45,315 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.76 vs. limit=15.0 2023-11-26 03:19:45,397 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.64 vs. limit=10.0 2023-11-26 03:19:46,566 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.93 vs. limit=15.0 2023-11-26 03:19:51,434 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 03:19:56,863 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.59 vs. limit=22.5 2023-11-26 03:20:03,836 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.518e+01 8.860e+01 9.371e+01 1.044e+02 1.411e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-26 03:20:06,037 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 10400, loss[loss=0.0698, simple_loss=0.09353, pruned_loss=0.01372, audio_tagging_loss=0.009318, over 15238.00 frames. ], tot_loss[loss=0.06713, simple_loss=0.09049, pruned_loss=0.01262, audio_tagging_loss=0.009271, over 3048374.67 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:20:29,575 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 479350 2023-11-26 03:20:29,956 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.80 vs. limit=10.0 2023-11-26 03:20:40,604 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.52 vs. limit=12.0 2023-11-26 03:20:45,684 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3195720.0, ans=0.125 2023-11-26 03:20:48,083 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.34 vs. limit=15.0 2023-11-26 03:20:54,734 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3195786.6666666665, ans=0.125 2023-11-26 03:21:02,552 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 10450, loss[loss=0.0528, simple_loss=0.07117, pruned_loss=0.007848, audio_tagging_loss=0.009365, over 15726.00 frames. ], tot_loss[loss=0.06676, simple_loss=0.08983, pruned_loss=0.01261, audio_tagging_loss=0.009242, over 3041721.08 frames. ], batch size: 62, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:21:02,777 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3195853.3333333335, ans=0.0 2023-11-26 03:21:12,020 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3195853.3333333335, ans=0.025 2023-11-26 03:21:24,311 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3195986.6666666665, ans=0.125 2023-11-26 03:21:25,151 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 479400 2023-11-26 03:21:31,012 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3195986.6666666665, ans=0.2 2023-11-26 03:21:35,759 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3196053.3333333335, ans=0.0 2023-11-26 03:21:48,859 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.73 vs. limit=15.0 2023-11-26 03:21:56,747 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.499e+01 8.740e+01 9.314e+01 1.011e+02 1.304e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-26 03:21:59,374 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 10500, loss[loss=0.06894, simple_loss=0.09226, pruned_loss=0.01492, audio_tagging_loss=0.00789, over 15356.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.08923, pruned_loss=0.0125, audio_tagging_loss=0.009178, over 3041906.65 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:22:13,263 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3196253.3333333335, ans=0.1 2023-11-26 03:22:15,486 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 03:22:20,322 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3196320.0, ans=0.2 2023-11-26 03:22:21,280 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 479450 2023-11-26 03:22:25,596 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3196320.0, ans=0.0 2023-11-26 03:22:34,163 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3196386.6666666665, ans=0.125 2023-11-26 03:22:35,102 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=3196386.6666666665, ans=0.95 2023-11-26 03:22:46,289 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3196453.3333333335, ans=0.125 2023-11-26 03:22:50,947 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.01 vs. limit=15.0 2023-11-26 03:22:54,781 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 10550, loss[loss=0.08178, simple_loss=0.1197, pruned_loss=0.01687, audio_tagging_loss=0.00507, over 15001.00 frames. ], tot_loss[loss=0.06701, simple_loss=0.09057, pruned_loss=0.01273, audio_tagging_loss=0.008991, over 3043381.55 frames. ], batch size: 54, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:22:59,861 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3196520.0, ans=0.0 2023-11-26 03:23:00,188 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.30 vs. limit=6.0 2023-11-26 03:23:17,157 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.85 vs. limit=12.0 2023-11-26 03:23:17,850 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 479500 2023-11-26 03:23:21,291 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3196653.3333333335, ans=0.0 2023-11-26 03:23:48,525 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.377e+01 8.656e+01 9.441e+01 1.040e+02 1.486e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-26 03:23:50,621 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 10600, loss[loss=0.06386, simple_loss=0.08553, pruned_loss=0.01227, audio_tagging_loss=0.008824, over 15674.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.08964, pruned_loss=0.01254, audio_tagging_loss=0.009063, over 3041010.31 frames. ], batch size: 58, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:23:56,089 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3196853.3333333335, ans=0.0 2023-11-26 03:23:57,163 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3196853.3333333335, ans=0.0 2023-11-26 03:23:58,173 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=3196853.3333333335, ans=0.025 2023-11-26 03:24:00,347 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3196853.3333333335, ans=0.07 2023-11-26 03:24:13,343 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 479550 2023-11-26 03:24:15,815 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.91 vs. limit=6.0 2023-11-26 03:24:19,842 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3196986.6666666665, ans=0.0 2023-11-26 03:24:21,976 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3196986.6666666665, ans=0.0 2023-11-26 03:24:39,291 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3197120.0, ans=0.025 2023-11-26 03:24:46,464 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 10650, loss[loss=0.05864, simple_loss=0.08547, pruned_loss=0.008644, audio_tagging_loss=0.007258, over 16157.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.08982, pruned_loss=0.01244, audio_tagging_loss=0.00887, over 3044856.88 frames. ], batch size: 59, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:25:04,104 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3197253.3333333335, ans=0.125 2023-11-26 03:25:06,935 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.99 vs. limit=6.0 2023-11-26 03:25:08,819 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 479600 2023-11-26 03:25:28,682 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.87 vs. limit=22.5 2023-11-26 03:25:37,688 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3197453.3333333335, ans=0.125 2023-11-26 03:25:38,707 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-26 03:25:41,507 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.674e+01 8.472e+01 9.133e+01 9.975e+01 1.277e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-26 03:25:42,612 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 10700, loss[loss=0.09166, simple_loss=0.1192, pruned_loss=0.02517, audio_tagging_loss=0.006906, over 14841.00 frames. ], tot_loss[loss=0.0667, simple_loss=0.09074, pruned_loss=0.01257, audio_tagging_loss=0.008762, over 3044633.90 frames. ], batch size: 53, lr: 1.68e-03, grad_scale: 8.0 2023-11-26 03:25:50,754 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 03:25:51,927 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3197520.0, ans=0.125 2023-11-26 03:26:01,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3197586.6666666665, ans=0.125 2023-11-26 03:26:05,523 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 479650 2023-11-26 03:26:11,070 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3197653.3333333335, ans=0.0 2023-11-26 03:26:23,143 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3197720.0, ans=0.125 2023-11-26 03:26:29,531 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3197786.6666666665, ans=0.2 2023-11-26 03:26:38,382 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 10750, loss[loss=0.06519, simple_loss=0.08678, pruned_loss=0.0112, audio_tagging_loss=0.01061, over 15574.00 frames. ], tot_loss[loss=0.06676, simple_loss=0.09086, pruned_loss=0.01256, audio_tagging_loss=0.008774, over 3045472.00 frames. ], batch size: 59, lr: 1.68e-03, grad_scale: 8.0 2023-11-26 03:26:39,918 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.78 vs. limit=22.5 2023-11-26 03:26:45,933 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3197853.3333333335, ans=0.1 2023-11-26 03:27:00,719 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 479700 2023-11-26 03:27:20,995 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3198053.3333333335, ans=0.125 2023-11-26 03:27:33,595 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.243e+01 8.525e+01 9.180e+01 9.934e+01 1.273e+02, threshold=1.836e+02, percent-clipped=0.0 2023-11-26 03:27:34,668 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 10800, loss[loss=0.06125, simple_loss=0.08351, pruned_loss=0.01107, audio_tagging_loss=0.008426, over 15694.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.09057, pruned_loss=0.01252, audio_tagging_loss=0.008776, over 3045629.84 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:27:36,905 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3198186.6666666665, ans=0.035 2023-11-26 03:27:56,911 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 479750 2023-11-26 03:28:07,238 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3198386.6666666665, ans=0.0 2023-11-26 03:28:15,185 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3198386.6666666665, ans=0.0 2023-11-26 03:28:18,344 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3198453.3333333335, ans=0.1 2023-11-26 03:28:20,794 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.37 vs. limit=15.0 2023-11-26 03:28:23,780 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3198453.3333333335, ans=0.125 2023-11-26 03:28:29,956 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 10850, loss[loss=0.06964, simple_loss=0.09971, pruned_loss=0.01218, audio_tagging_loss=0.007606, over 15775.00 frames. ], tot_loss[loss=0.06646, simple_loss=0.09026, pruned_loss=0.01251, audio_tagging_loss=0.008821, over 3049787.76 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:28:39,727 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3198520.0, ans=0.0 2023-11-26 03:28:40,871 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3198586.6666666665, ans=0.1 2023-11-26 03:28:44,368 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.01 vs. limit=15.0 2023-11-26 03:28:53,343 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 479800 2023-11-26 03:29:08,586 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3198720.0, ans=0.125 2023-11-26 03:29:09,626 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3198720.0, ans=0.0 2023-11-26 03:29:18,742 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3198786.6666666665, ans=0.125 2023-11-26 03:29:24,294 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 03:29:25,261 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.001e+01 8.656e+01 9.276e+01 9.852e+01 1.367e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-26 03:29:26,355 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 10900, loss[loss=0.03914, simple_loss=0.05117, pruned_loss=0.00556, audio_tagging_loss=0.008001, over 14370.00 frames. ], tot_loss[loss=0.06718, simple_loss=0.09107, pruned_loss=0.01273, audio_tagging_loss=0.008914, over 3051709.82 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:29:27,531 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3198853.3333333335, ans=0.125 2023-11-26 03:29:29,810 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3198853.3333333335, ans=0.125 2023-11-26 03:29:48,679 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 479850 2023-11-26 03:30:03,274 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3199053.3333333335, ans=0.0 2023-11-26 03:30:12,298 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3199120.0, ans=0.125 2023-11-26 03:30:22,724 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 10950, loss[loss=0.056, simple_loss=0.08168, pruned_loss=0.007982, audio_tagging_loss=0.007183, over 15107.00 frames. ], tot_loss[loss=0.06706, simple_loss=0.09094, pruned_loss=0.01276, audio_tagging_loss=0.008831, over 3048135.36 frames. ], batch size: 59, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:30:24,090 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3199186.6666666665, ans=0.95 2023-11-26 03:30:43,617 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3199320.0, ans=0.125 2023-11-26 03:30:44,442 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 479900 2023-11-26 03:30:53,621 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3199320.0, ans=0.0 2023-11-26 03:31:10,632 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3199453.3333333335, ans=0.0 2023-11-26 03:31:12,800 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3199453.3333333335, ans=0.2 2023-11-26 03:31:16,797 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.102e+01 8.536e+01 9.275e+01 9.846e+01 1.244e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-26 03:31:17,920 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 11000, loss[loss=0.06077, simple_loss=0.0775, pruned_loss=0.01292, audio_tagging_loss=0.009101, over 14596.00 frames. ], tot_loss[loss=0.06679, simple_loss=0.09038, pruned_loss=0.01268, audio_tagging_loss=0.008926, over 3046330.07 frames. ], batch size: 55, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:31:27,933 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 03:31:40,779 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 479950 2023-11-26 03:31:46,152 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.26 vs. limit=10.0 2023-11-26 03:32:08,254 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.85 vs. limit=15.0 2023-11-26 03:32:14,139 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 11050, loss[loss=0.04485, simple_loss=0.05934, pruned_loss=0.006443, audio_tagging_loss=0.008737, over 14365.00 frames. ], tot_loss[loss=0.06677, simple_loss=0.09033, pruned_loss=0.01263, audio_tagging_loss=0.008981, over 3050166.18 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:32:36,590 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 480000 2023-11-26 03:32:37,923 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-480000.pt 2023-11-26 03:32:42,087 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3199986.6666666665, ans=0.0 2023-11-26 03:33:02,989 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3200120.0, ans=0.125 2023-11-26 03:33:11,098 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.470e+01 8.630e+01 9.436e+01 1.031e+02 1.953e+02, threshold=1.887e+02, percent-clipped=1.0 2023-11-26 03:33:12,216 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 11100, loss[loss=0.04913, simple_loss=0.06981, pruned_loss=0.005664, audio_tagging_loss=0.008559, over 15917.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.0899, pruned_loss=0.01263, audio_tagging_loss=0.009133, over 3050489.03 frames. ], batch size: 61, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:33:33,449 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 480050 2023-11-26 03:33:47,304 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3200386.6666666665, ans=0.0 2023-11-26 03:33:52,620 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3200386.6666666665, ans=0.2 2023-11-26 03:33:57,019 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2023-11-26 03:34:06,369 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3200520.0, ans=0.0 2023-11-26 03:34:07,197 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 11150, loss[loss=0.07425, simple_loss=0.09581, pruned_loss=0.01596, audio_tagging_loss=0.01039, over 14780.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.08978, pruned_loss=0.01276, audio_tagging_loss=0.009212, over 3046210.69 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:34:23,328 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3200586.6666666665, ans=0.125 2023-11-26 03:34:27,856 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.25 vs. limit=12.0 2023-11-26 03:34:29,459 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 480100 2023-11-26 03:34:45,308 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3200720.0, ans=0.0 2023-11-26 03:34:47,431 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3200720.0, ans=0.125 2023-11-26 03:34:47,569 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3200720.0, ans=0.0 2023-11-26 03:34:48,475 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3200720.0, ans=0.2 2023-11-26 03:34:50,648 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3200786.6666666665, ans=0.125 2023-11-26 03:35:00,980 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.121e+01 8.918e+01 9.641e+01 1.031e+02 1.753e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-26 03:35:02,667 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 11200, loss[loss=0.07996, simple_loss=0.1062, pruned_loss=0.01715, audio_tagging_loss=0.009729, over 14453.00 frames. ], tot_loss[loss=0.06688, simple_loss=0.08981, pruned_loss=0.01272, audio_tagging_loss=0.009255, over 3050558.31 frames. ], batch size: 54, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:35:08,931 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.09 vs. limit=22.5 2023-11-26 03:35:17,227 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3200920.0, ans=0.0 2023-11-26 03:35:19,147 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3200920.0, ans=0.125 2023-11-26 03:35:21,449 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3200920.0, ans=0.125 2023-11-26 03:35:22,477 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3200920.0, ans=0.0 2023-11-26 03:35:25,401 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 480150 2023-11-26 03:35:31,290 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.86 vs. limit=6.0 2023-11-26 03:35:37,166 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3201053.3333333335, ans=0.125 2023-11-26 03:35:56,859 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.50 vs. limit=15.0 2023-11-26 03:35:59,476 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 11250, loss[loss=0.06756, simple_loss=0.08774, pruned_loss=0.0154, audio_tagging_loss=0.008283, over 15278.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.08886, pruned_loss=0.01272, audio_tagging_loss=0.009284, over 3048561.22 frames. ], batch size: 58, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:36:00,720 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3201186.6666666665, ans=0.0 2023-11-26 03:36:00,754 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3201186.6666666665, ans=0.0 2023-11-26 03:36:15,602 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3201253.3333333335, ans=0.07 2023-11-26 03:36:20,658 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 480200 2023-11-26 03:36:35,969 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3201386.6666666665, ans=0.125 2023-11-26 03:36:53,606 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.596e+01 8.665e+01 9.306e+01 1.002e+02 1.136e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-26 03:36:54,681 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 11300, loss[loss=0.05054, simple_loss=0.06095, pruned_loss=0.00902, audio_tagging_loss=0.01104, over 15192.00 frames. ], tot_loss[loss=0.06649, simple_loss=0.08929, pruned_loss=0.01268, audio_tagging_loss=0.009166, over 3047539.64 frames. ], batch size: 59, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:36:57,055 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3201520.0, ans=0.0 2023-11-26 03:36:58,100 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3201520.0, ans=0.125 2023-11-26 03:37:00,497 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.01 vs. limit=22.5 2023-11-26 03:37:04,347 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3201586.6666666665, ans=0.125 2023-11-26 03:37:06,613 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3201586.6666666665, ans=0.0 2023-11-26 03:37:08,697 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3201586.6666666665, ans=0.125 2023-11-26 03:37:16,555 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 480250 2023-11-26 03:37:20,352 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3201653.3333333335, ans=0.1 2023-11-26 03:37:20,741 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.74 vs. limit=15.0 2023-11-26 03:37:37,664 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.60 vs. limit=15.0 2023-11-26 03:37:43,705 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3201786.6666666665, ans=0.0 2023-11-26 03:37:45,910 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3201786.6666666665, ans=0.125 2023-11-26 03:37:49,062 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3201853.3333333335, ans=0.125 2023-11-26 03:37:49,988 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 11350, loss[loss=0.06282, simple_loss=0.08603, pruned_loss=0.01154, audio_tagging_loss=0.008268, over 16412.00 frames. ], tot_loss[loss=0.06689, simple_loss=0.08997, pruned_loss=0.01293, audio_tagging_loss=0.008976, over 3044437.75 frames. ], batch size: 60, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:37:52,877 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3201853.3333333335, ans=0.125 2023-11-26 03:37:52,878 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3201853.3333333335, ans=0.125 2023-11-26 03:38:01,457 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3201920.0, ans=0.125 2023-11-26 03:38:04,241 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3201920.0, ans=0.0 2023-11-26 03:38:05,807 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.11 vs. limit=6.0 2023-11-26 03:38:13,020 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 480300 2023-11-26 03:38:25,794 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3202053.3333333335, ans=0.125 2023-11-26 03:38:44,350 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.296e+01 8.638e+01 9.308e+01 1.022e+02 1.333e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-26 03:38:45,432 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 11400, loss[loss=0.05441, simple_loss=0.07667, pruned_loss=0.008288, audio_tagging_loss=0.007785, over 15506.00 frames. ], tot_loss[loss=0.06683, simple_loss=0.0902, pruned_loss=0.01288, audio_tagging_loss=0.008842, over 3039785.10 frames. ], batch size: 60, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:38:52,169 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3202186.6666666665, ans=0.0 2023-11-26 03:39:07,952 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 480350 2023-11-26 03:39:21,506 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3202386.6666666665, ans=0.1 2023-11-26 03:39:25,015 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.50 vs. limit=15.0 2023-11-26 03:39:33,551 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3202453.3333333335, ans=0.0 2023-11-26 03:39:41,783 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 11450, loss[loss=0.06281, simple_loss=0.08857, pruned_loss=0.008833, audio_tagging_loss=0.009698, over 15538.00 frames. ], tot_loss[loss=0.06713, simple_loss=0.09121, pruned_loss=0.01288, audio_tagging_loss=0.008644, over 3045565.83 frames. ], batch size: 59, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:39:43,168 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3202520.0, ans=0.125 2023-11-26 03:39:53,675 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3202586.6666666665, ans=0.0 2023-11-26 03:39:59,624 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3202586.6666666665, ans=0.125 2023-11-26 03:40:00,711 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3202586.6666666665, ans=0.125 2023-11-26 03:40:03,725 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 480400 2023-11-26 03:40:27,004 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3202786.6666666665, ans=0.125 2023-11-26 03:40:37,359 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.451e+01 8.830e+01 9.338e+01 1.004e+02 1.564e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-26 03:40:37,386 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 11500, loss[loss=0.0725, simple_loss=0.08957, pruned_loss=0.01798, audio_tagging_loss=0.009731, over 14558.00 frames. ], tot_loss[loss=0.06708, simple_loss=0.09114, pruned_loss=0.01281, audio_tagging_loss=0.008695, over 3043042.18 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:40:40,565 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3202853.3333333335, ans=0.125 2023-11-26 03:40:48,934 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.61 vs. limit=22.5 2023-11-26 03:41:00,694 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 480450 2023-11-26 03:41:15,780 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3203053.3333333335, ans=0.1 2023-11-26 03:41:21,039 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3203120.0, ans=0.125 2023-11-26 03:41:21,372 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.20 vs. limit=15.0 2023-11-26 03:41:29,063 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3203120.0, ans=0.125 2023-11-26 03:41:32,711 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.06 vs. limit=15.0 2023-11-26 03:41:33,096 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 11550, loss[loss=0.0688, simple_loss=0.09591, pruned_loss=0.01493, audio_tagging_loss=0.005918, over 15800.00 frames. ], tot_loss[loss=0.06653, simple_loss=0.09014, pruned_loss=0.01268, audio_tagging_loss=0.008781, over 3036423.83 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:41:48,677 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3203253.3333333335, ans=0.0 2023-11-26 03:41:49,818 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3203253.3333333335, ans=0.125 2023-11-26 03:41:52,884 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3203253.3333333335, ans=0.0 2023-11-26 03:41:55,979 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 480500 2023-11-26 03:42:03,517 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3203320.0, ans=0.5 2023-11-26 03:42:09,101 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 03:42:14,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3203386.6666666665, ans=0.2 2023-11-26 03:42:20,878 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.34 vs. limit=15.0 2023-11-26 03:42:29,046 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.474e+01 8.932e+01 9.599e+01 1.033e+02 1.724e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-26 03:42:29,072 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 11600, loss[loss=0.06174, simple_loss=0.08741, pruned_loss=0.009894, audio_tagging_loss=0.008138, over 14472.00 frames. ], tot_loss[loss=0.06685, simple_loss=0.09066, pruned_loss=0.01272, audio_tagging_loss=0.008804, over 3036180.95 frames. ], batch size: 55, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:42:30,374 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3203520.0, ans=0.125 2023-11-26 03:42:33,452 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3203520.0, ans=0.2 2023-11-26 03:42:34,648 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3203520.0, ans=0.125 2023-11-26 03:42:39,860 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3203586.6666666665, ans=0.125 2023-11-26 03:42:50,943 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 480550 2023-11-26 03:42:53,231 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=3203653.3333333335, ans=0.025 2023-11-26 03:42:56,634 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.46 vs. limit=12.0 2023-11-26 03:43:24,212 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 11650, loss[loss=0.06128, simple_loss=0.08278, pruned_loss=0.01151, audio_tagging_loss=0.008379, over 15474.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.09003, pruned_loss=0.01253, audio_tagging_loss=0.008873, over 3040958.88 frames. ], batch size: 59, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:43:31,278 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3203853.3333333335, ans=0.2 2023-11-26 03:43:46,881 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 480600 2023-11-26 03:44:17,019 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3204120.0, ans=0.125 2023-11-26 03:44:19,952 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.338e+01 8.387e+01 9.006e+01 9.801e+01 1.650e+02, threshold=1.801e+02, percent-clipped=0.0 2023-11-26 03:44:19,979 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 11700, loss[loss=0.0536, simple_loss=0.06553, pruned_loss=0.009516, audio_tagging_loss=0.01132, over 15455.00 frames. ], tot_loss[loss=0.06644, simple_loss=0.09008, pruned_loss=0.01256, audio_tagging_loss=0.008843, over 3041990.64 frames. ], batch size: 58, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:44:23,262 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3204186.6666666665, ans=0.0 2023-11-26 03:44:23,287 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3204186.6666666665, ans=0.0 2023-11-26 03:44:31,755 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3204253.3333333335, ans=0.1 2023-11-26 03:44:39,836 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3204253.3333333335, ans=0.125 2023-11-26 03:44:42,861 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 480650 2023-11-26 03:44:43,047 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3204320.0, ans=0.1 2023-11-26 03:44:46,125 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3204320.0, ans=0.125 2023-11-26 03:44:46,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3204320.0, ans=0.05 2023-11-26 03:44:52,528 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3204386.6666666665, ans=0.0 2023-11-26 03:45:13,994 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3204453.3333333335, ans=0.125 2023-11-26 03:45:15,959 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 11750, loss[loss=0.06954, simple_loss=0.09889, pruned_loss=0.01128, audio_tagging_loss=0.008821, over 15631.00 frames. ], tot_loss[loss=0.0669, simple_loss=0.09085, pruned_loss=0.01265, audio_tagging_loss=0.008832, over 3041300.34 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:45:22,205 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3204520.0, ans=0.125 2023-11-26 03:45:27,540 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3204586.6666666665, ans=0.2 2023-11-26 03:45:36,215 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.39 vs. limit=22.5 2023-11-26 03:45:38,304 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 480700 2023-11-26 03:45:39,812 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.37 vs. limit=10.0 2023-11-26 03:45:46,691 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3204653.3333333335, ans=0.0 2023-11-26 03:45:54,747 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=14.25 vs. limit=15.0 2023-11-26 03:45:55,274 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3204720.0, ans=0.0 2023-11-26 03:46:11,480 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.706e+01 8.820e+01 9.557e+01 1.032e+02 1.520e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-26 03:46:11,511 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 11800, loss[loss=0.0585, simple_loss=0.07032, pruned_loss=0.01383, audio_tagging_loss=0.00951, over 15659.00 frames. ], tot_loss[loss=0.0668, simple_loss=0.09052, pruned_loss=0.01265, audio_tagging_loss=0.008893, over 3047841.21 frames. ], batch size: 60, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:46:24,286 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.07 vs. limit=22.5 2023-11-26 03:46:34,413 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 480750 2023-11-26 03:46:36,629 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3204986.6666666665, ans=0.125 2023-11-26 03:46:38,723 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3204986.6666666665, ans=0.125 2023-11-26 03:46:51,076 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3205053.3333333335, ans=10.0 2023-11-26 03:47:07,426 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 11850, loss[loss=0.06691, simple_loss=0.0913, pruned_loss=0.01023, audio_tagging_loss=0.01104, over 15699.00 frames. ], tot_loss[loss=0.0668, simple_loss=0.09046, pruned_loss=0.01261, audio_tagging_loss=0.008956, over 3049895.81 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:47:07,682 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3205186.6666666665, ans=0.1 2023-11-26 03:47:14,039 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3205186.6666666665, ans=0.125 2023-11-26 03:47:29,854 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 480800 2023-11-26 03:47:41,370 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3205386.6666666665, ans=0.125 2023-11-26 03:47:47,410 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.52 vs. limit=12.0 2023-11-26 03:48:03,885 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 11900, loss[loss=0.05965, simple_loss=0.07884, pruned_loss=0.009103, audio_tagging_loss=0.01112, over 14733.00 frames. ], tot_loss[loss=0.067, simple_loss=0.09062, pruned_loss=0.01269, audio_tagging_loss=0.009002, over 3045850.83 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:48:04,896 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.914e+01 8.863e+01 9.443e+01 1.007e+02 1.384e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-26 03:48:25,788 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 480850 2023-11-26 03:48:42,250 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3205720.0, ans=0.1 2023-11-26 03:48:43,448 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3205720.0, ans=0.05 2023-11-26 03:48:59,009 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 11950, loss[loss=0.08739, simple_loss=0.1201, pruned_loss=0.0189, audio_tagging_loss=0.008415, over 16135.00 frames. ], tot_loss[loss=0.06715, simple_loss=0.09089, pruned_loss=0.01267, audio_tagging_loss=0.009042, over 3042729.74 frames. ], batch size: 59, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:49:12,257 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3205920.0, ans=0.0 2023-11-26 03:49:12,278 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3205920.0, ans=0.125 2023-11-26 03:49:14,289 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3205920.0, ans=0.2 2023-11-26 03:49:21,504 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 480900 2023-11-26 03:49:44,283 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 03:49:53,273 INFO [train_asr.py:1235] (0/4) Epoch 40, batch 12000, loss[loss=0.05807, simple_loss=0.07521, pruned_loss=0.008403, audio_tagging_loss=0.01207, over 15102.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.09001, pruned_loss=0.01239, audio_tagging_loss=0.009204, over 3049018.45 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:49:53,276 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-26 03:50:12,656 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.5078, 3.8016, 2.9387, 3.7722], device='cuda:0') 2023-11-26 03:50:15,704 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.4981, 3.8531, 2.9557, 3.7734], device='cuda:0') 2023-11-26 03:50:25,669 INFO [train_asr.py:1267] (0/4) Epoch 40, validation: loss=0.0579, simple_loss=0.05064, pruned_loss=0.005235, audio_tagging_loss=0.02734, over 4681554.00 frames. 2023-11-26 03:50:25,670 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-26 03:50:26,639 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.710e+01 8.771e+01 9.492e+01 1.018e+02 1.259e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-26 03:50:28,864 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3206186.6666666665, ans=0.125 2023-11-26 03:50:29,147 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.18 vs. limit=6.0 2023-11-26 03:50:47,184 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 480950 2023-11-26 03:50:48,307 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3206320.0, ans=0.1 2023-11-26 03:50:54,381 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/epoch-40.pt 2023-11-26 03:51:24,316 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 0, loss[loss=0.07148, simple_loss=0.08521, pruned_loss=0.006857, audio_tagging_loss=0.02202, over 14985.00 frames. ], tot_loss[loss=0.07148, simple_loss=0.08521, pruned_loss=0.006857, audio_tagging_loss=0.02202, over 14985.00 frames. ], batch size: 55, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 03:51:24,317 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-26 03:51:55,675 INFO [train_asr.py:1267] (0/4) Epoch 41, validation: loss=0.05811, simple_loss=0.05068, pruned_loss=0.005302, audio_tagging_loss=0.02746, over 4681554.00 frames. 2023-11-26 03:51:55,676 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-26 03:51:55,853 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3206360.0, ans=0.0 2023-11-26 03:52:13,858 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=3206426.6666666665, ans=0.05 2023-11-26 03:52:16,139 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3206426.6666666665, ans=0.2 2023-11-26 03:52:39,824 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.44 vs. limit=12.0 2023-11-26 03:52:44,405 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 481000 2023-11-26 03:52:51,488 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 50, loss[loss=0.06228, simple_loss=0.06873, pruned_loss=0.009813, audio_tagging_loss=0.0181, over 14763.00 frames. ], tot_loss[loss=0.07473, simple_loss=0.08961, pruned_loss=0.01249, audio_tagging_loss=0.01743, over 683992.78 frames. ], batch size: 57, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 03:52:52,816 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3206693.3333333335, ans=0.125 2023-11-26 03:53:06,045 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3206760.0, ans=0.125 2023-11-26 03:53:11,536 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3206760.0, ans=0.125 2023-11-26 03:53:17,904 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.15 vs. limit=15.0 2023-11-26 03:53:19,597 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.787e+01 9.351e+01 1.009e+02 1.085e+02 1.541e+02, threshold=2.017e+02, percent-clipped=0.0 2023-11-26 03:53:21,871 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3206826.6666666665, ans=0.0 2023-11-26 03:53:25,222 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3206893.3333333335, ans=0.2 2023-11-26 03:53:41,044 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 481050 2023-11-26 03:53:47,340 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 100, loss[loss=0.06525, simple_loss=0.08157, pruned_loss=0.008878, audio_tagging_loss=0.01558, over 15581.00 frames. ], tot_loss[loss=0.07274, simple_loss=0.08796, pruned_loss=0.01214, audio_tagging_loss=0.01662, over 1205098.42 frames. ], batch size: 57, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 03:53:56,451 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3207026.6666666665, ans=0.125 2023-11-26 03:53:57,426 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3207093.3333333335, ans=0.0 2023-11-26 03:53:57,490 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3207093.3333333335, ans=0.1 2023-11-26 03:54:02,875 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=3207093.3333333335, ans=6.0 2023-11-26 03:54:07,056 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3207093.3333333335, ans=0.04949747468305833 2023-11-26 03:54:28,159 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3207226.6666666665, ans=0.0 2023-11-26 03:54:36,741 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 481100 2023-11-26 03:54:43,047 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 150, loss[loss=0.06212, simple_loss=0.07518, pruned_loss=0.01062, audio_tagging_loss=0.01391, over 15056.00 frames. ], tot_loss[loss=0.07124, simple_loss=0.08846, pruned_loss=0.01215, audio_tagging_loss=0.01486, over 1612504.94 frames. ], batch size: 59, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 03:54:51,896 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3207360.0, ans=0.125 2023-11-26 03:54:59,418 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3207426.6666666665, ans=0.025 2023-11-26 03:55:10,792 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.583e+01 9.007e+01 9.477e+01 1.014e+02 1.465e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-26 03:55:32,204 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 481150 2023-11-26 03:55:38,452 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 200, loss[loss=0.07817, simple_loss=0.1042, pruned_loss=0.01883, audio_tagging_loss=0.007237, over 16130.00 frames. ], tot_loss[loss=0.0704, simple_loss=0.08989, pruned_loss=0.01242, audio_tagging_loss=0.01304, over 1929350.75 frames. ], batch size: 59, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 03:55:43,478 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3207693.3333333335, ans=0.125 2023-11-26 03:55:48,347 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3207693.3333333335, ans=0.1 2023-11-26 03:56:04,251 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3207826.6666666665, ans=0.125 2023-11-26 03:56:05,811 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.61 vs. limit=15.0 2023-11-26 03:56:11,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3207893.3333333335, ans=0.125 2023-11-26 03:56:28,206 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 481200 2023-11-26 03:56:35,401 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 250, loss[loss=0.08493, simple_loss=0.1093, pruned_loss=0.02, audio_tagging_loss=0.01029, over 14857.00 frames. ], tot_loss[loss=0.06981, simple_loss=0.09119, pruned_loss=0.01257, audio_tagging_loss=0.01164, over 2176538.09 frames. ], batch size: 55, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 03:56:37,723 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3208026.6666666665, ans=0.1 2023-11-26 03:57:04,325 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.671e+01 8.798e+01 9.430e+01 1.056e+02 1.787e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-26 03:57:14,134 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3208226.6666666665, ans=0.125 2023-11-26 03:57:24,966 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 481250 2023-11-26 03:57:25,067 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3208293.3333333335, ans=0.125 2023-11-26 03:57:31,763 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 300, loss[loss=0.05658, simple_loss=0.06754, pruned_loss=0.01038, audio_tagging_loss=0.01242, over 14697.00 frames. ], tot_loss[loss=0.06901, simple_loss=0.09089, pruned_loss=0.01263, audio_tagging_loss=0.01093, over 2376256.48 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 03:57:46,813 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3208426.6666666665, ans=0.125 2023-11-26 03:57:58,568 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3208493.3333333335, ans=10.0 2023-11-26 03:58:20,679 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 481300 2023-11-26 03:58:26,977 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 350, loss[loss=0.06855, simple_loss=0.09178, pruned_loss=0.01506, audio_tagging_loss=0.007589, over 15076.00 frames. ], tot_loss[loss=0.06892, simple_loss=0.09172, pruned_loss=0.01272, audio_tagging_loss=0.01034, over 2525159.41 frames. ], batch size: 57, lr: 1.66e-03, grad_scale: 8.0 2023-11-26 03:58:28,253 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3208693.3333333335, ans=0.125 2023-11-26 03:58:36,124 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3208693.3333333335, ans=0.125 2023-11-26 03:58:39,395 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3208760.0, ans=0.0 2023-11-26 03:58:40,863 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.44 vs. limit=15.0 2023-11-26 03:58:52,556 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3208826.6666666665, ans=0.0 2023-11-26 03:58:56,949 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3208826.6666666665, ans=0.125 2023-11-26 03:58:57,802 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.712e+01 8.469e+01 9.311e+01 1.023e+02 1.499e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-26 03:58:58,081 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3208826.6666666665, ans=0.2 2023-11-26 03:59:01,187 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3208893.3333333335, ans=0.125 2023-11-26 03:59:05,831 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.10 vs. limit=15.0 2023-11-26 03:59:07,111 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.82 vs. limit=5.0 2023-11-26 03:59:16,448 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 481350 2023-11-26 03:59:18,104 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.92 vs. limit=22.5 2023-11-26 03:59:19,668 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3208960.0, ans=0.125 2023-11-26 03:59:22,709 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 400, loss[loss=0.05155, simple_loss=0.07126, pruned_loss=0.009182, audio_tagging_loss=0.006737, over 14578.00 frames. ], tot_loss[loss=0.06783, simple_loss=0.09046, pruned_loss=0.01247, audio_tagging_loss=0.01012, over 2638036.85 frames. ], batch size: 55, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:00:11,917 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 481400 2023-11-26 04:00:18,063 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.74 vs. limit=15.0 2023-11-26 04:00:19,427 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 450, loss[loss=0.05803, simple_loss=0.07757, pruned_loss=0.008463, audio_tagging_loss=0.01078, over 14475.00 frames. ], tot_loss[loss=0.06855, simple_loss=0.09187, pruned_loss=0.01287, audio_tagging_loss=0.009748, over 2723504.25 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:00:19,694 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3209360.0, ans=0.2 2023-11-26 04:00:40,363 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.31 vs. limit=5.0 2023-11-26 04:00:48,526 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.983e+01 8.492e+01 9.023e+01 9.553e+01 1.244e+02, threshold=1.805e+02, percent-clipped=0.0 2023-11-26 04:01:08,237 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 481450 2023-11-26 04:01:08,410 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3209626.6666666665, ans=0.125 2023-11-26 04:01:14,691 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 500, loss[loss=0.07769, simple_loss=0.1098, pruned_loss=0.01664, audio_tagging_loss=0.006153, over 15882.00 frames. ], tot_loss[loss=0.06795, simple_loss=0.09107, pruned_loss=0.01292, audio_tagging_loss=0.00949, over 2798745.46 frames. ], batch size: 57, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:01:23,922 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3209693.3333333335, ans=0.2 2023-11-26 04:01:29,627 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.13 vs. limit=22.5 2023-11-26 04:01:34,698 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3209760.0, ans=0.025 2023-11-26 04:01:48,109 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3209893.3333333335, ans=0.1 2023-11-26 04:01:48,543 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.11 vs. limit=22.5 2023-11-26 04:01:57,988 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.49 vs. limit=15.0 2023-11-26 04:01:58,829 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3209960.0, ans=0.2 2023-11-26 04:01:59,873 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3209960.0, ans=0.0 2023-11-26 04:02:04,060 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 481500 2023-11-26 04:02:08,963 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3209960.0, ans=0.1 2023-11-26 04:02:10,859 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 550, loss[loss=0.05358, simple_loss=0.06999, pruned_loss=0.01036, audio_tagging_loss=0.00822, over 15225.00 frames. ], tot_loss[loss=0.06773, simple_loss=0.09113, pruned_loss=0.0129, audio_tagging_loss=0.009272, over 2858838.50 frames. ], batch size: 59, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:02:16,592 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.35 vs. limit=22.5 2023-11-26 04:02:41,093 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.145e+01 8.521e+01 9.213e+01 9.979e+01 1.259e+02, threshold=1.843e+02, percent-clipped=0.0 2023-11-26 04:02:57,668 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3210293.3333333335, ans=0.1 2023-11-26 04:02:58,799 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 04:02:59,637 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 481550 2023-11-26 04:03:01,294 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.04 vs. limit=10.0 2023-11-26 04:03:06,625 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 600, loss[loss=0.07073, simple_loss=0.09697, pruned_loss=0.01334, audio_tagging_loss=0.008897, over 17208.00 frames. ], tot_loss[loss=0.06753, simple_loss=0.0913, pruned_loss=0.01281, audio_tagging_loss=0.009071, over 2896626.57 frames. ], batch size: 67, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:03:10,021 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3210360.0, ans=0.1 2023-11-26 04:03:16,565 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3210426.6666666665, ans=0.125 2023-11-26 04:03:16,578 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3210426.6666666665, ans=0.2 2023-11-26 04:03:27,005 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3210493.3333333335, ans=0.125 2023-11-26 04:03:31,789 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3210493.3333333335, ans=0.125 2023-11-26 04:03:35,846 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3210493.3333333335, ans=0.1 2023-11-26 04:03:51,335 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.68 vs. limit=15.0 2023-11-26 04:03:55,066 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 481600 2023-11-26 04:04:01,750 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 650, loss[loss=0.05535, simple_loss=0.07634, pruned_loss=0.008458, audio_tagging_loss=0.008719, over 13968.00 frames. ], tot_loss[loss=0.06717, simple_loss=0.09108, pruned_loss=0.01263, audio_tagging_loss=0.009006, over 2929405.89 frames. ], batch size: 53, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:04:06,129 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3210693.3333333335, ans=0.2 2023-11-26 04:04:32,178 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.220e+01 8.848e+01 9.324e+01 9.991e+01 1.249e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-26 04:04:32,779 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.09 vs. limit=15.0 2023-11-26 04:04:48,500 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3210960.0, ans=0.0 2023-11-26 04:04:50,513 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 481650 2023-11-26 04:04:57,409 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 700, loss[loss=0.05287, simple_loss=0.07319, pruned_loss=0.007627, audio_tagging_loss=0.008647, over 14783.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.09004, pruned_loss=0.01239, audio_tagging_loss=0.009083, over 2962207.73 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:04:57,578 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3211026.6666666665, ans=0.0 2023-11-26 04:04:58,777 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3211026.6666666665, ans=0.125 2023-11-26 04:05:01,935 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3211026.6666666665, ans=0.0 2023-11-26 04:05:31,957 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3211226.6666666665, ans=0.125 2023-11-26 04:05:33,050 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 04:05:37,888 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3211226.6666666665, ans=0.04949747468305833 2023-11-26 04:05:42,218 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3211293.3333333335, ans=0.09899494936611666 2023-11-26 04:05:43,516 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.34 vs. limit=15.0 2023-11-26 04:05:46,373 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 481700 2023-11-26 04:05:49,760 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3211293.3333333335, ans=0.125 2023-11-26 04:05:52,694 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 750, loss[loss=0.06757, simple_loss=0.08879, pruned_loss=0.01195, audio_tagging_loss=0.01123, over 14946.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.08979, pruned_loss=0.01238, audio_tagging_loss=0.00911, over 2980850.88 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:06:01,766 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.50 vs. limit=15.0 2023-11-26 04:06:15,388 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3211493.3333333335, ans=0.2 2023-11-26 04:06:23,106 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.257e+01 8.760e+01 9.267e+01 1.006e+02 1.673e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-26 04:06:26,628 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3211560.0, ans=0.1 2023-11-26 04:06:31,953 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3211560.0, ans=0.09899494936611666 2023-11-26 04:06:41,752 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 481750 2023-11-26 04:06:48,724 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 800, loss[loss=0.07223, simple_loss=0.09744, pruned_loss=0.01451, audio_tagging_loss=0.009002, over 15440.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.09054, pruned_loss=0.01248, audio_tagging_loss=0.009111, over 2995048.40 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:07:05,252 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3211760.0, ans=0.0 2023-11-26 04:07:31,935 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3211960.0, ans=0.0 2023-11-26 04:07:37,227 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 481800 2023-11-26 04:07:44,321 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 850, loss[loss=0.04896, simple_loss=0.06604, pruned_loss=0.006043, audio_tagging_loss=0.009902, over 14427.00 frames. ], tot_loss[loss=0.06709, simple_loss=0.09084, pruned_loss=0.01251, audio_tagging_loss=0.009161, over 3010051.26 frames. ], batch size: 54, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:07:47,772 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3212026.6666666665, ans=0.125 2023-11-26 04:07:49,889 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3212026.6666666665, ans=0.0 2023-11-26 04:08:01,941 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3212093.3333333335, ans=0.125 2023-11-26 04:08:14,111 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.717e+01 8.719e+01 9.497e+01 1.051e+02 1.257e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-26 04:08:28,661 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3212293.3333333335, ans=0.0 2023-11-26 04:08:32,648 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 481850 2023-11-26 04:08:38,950 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 900, loss[loss=0.06214, simple_loss=0.0771, pruned_loss=0.01193, audio_tagging_loss=0.01166, over 14834.00 frames. ], tot_loss[loss=0.06699, simple_loss=0.09037, pruned_loss=0.01263, audio_tagging_loss=0.009179, over 3021030.72 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:09:11,457 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3212560.0, ans=0.125 2023-11-26 04:09:16,004 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.95 vs. limit=6.0 2023-11-26 04:09:19,768 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 04:09:22,299 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.32 vs. limit=15.0 2023-11-26 04:09:27,769 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 481900 2023-11-26 04:09:30,154 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3212626.6666666665, ans=0.1 2023-11-26 04:09:34,168 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 950, loss[loss=0.07357, simple_loss=0.09801, pruned_loss=0.01613, audio_tagging_loss=0.00844, over 14871.00 frames. ], tot_loss[loss=0.0669, simple_loss=0.09056, pruned_loss=0.01257, audio_tagging_loss=0.009059, over 3032880.14 frames. ], batch size: 55, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:09:42,447 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3212693.3333333335, ans=0.125 2023-11-26 04:09:46,724 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3212760.0, ans=0.1 2023-11-26 04:10:04,072 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.539e+01 8.675e+01 9.421e+01 1.013e+02 1.384e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-26 04:10:23,846 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 481950 2023-11-26 04:10:25,084 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3212960.0, ans=0.1 2023-11-26 04:10:26,446 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.11 vs. limit=15.0 2023-11-26 04:10:30,173 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 1000, loss[loss=0.07223, simple_loss=0.09969, pruned_loss=0.01253, audio_tagging_loss=0.009866, over 13960.00 frames. ], tot_loss[loss=0.06705, simple_loss=0.09064, pruned_loss=0.01276, audio_tagging_loss=0.00897, over 3031387.51 frames. ], batch size: 53, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:10:42,658 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3213093.3333333335, ans=0.5 2023-11-26 04:10:54,196 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 04:11:00,656 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3213160.0, ans=0.125 2023-11-26 04:11:17,831 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3213293.3333333335, ans=0.125 2023-11-26 04:11:18,767 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3213293.3333333335, ans=0.2 2023-11-26 04:11:19,740 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 482000 2023-11-26 04:11:26,292 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 1050, loss[loss=0.08432, simple_loss=0.1231, pruned_loss=0.01788, audio_tagging_loss=0.004859, over 15561.00 frames. ], tot_loss[loss=0.06728, simple_loss=0.09117, pruned_loss=0.01288, audio_tagging_loss=0.008815, over 3037836.29 frames. ], batch size: 55, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:11:40,576 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.16 vs. limit=6.0 2023-11-26 04:11:41,354 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3213426.6666666665, ans=0.125 2023-11-26 04:11:41,366 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3213426.6666666665, ans=0.125 2023-11-26 04:11:57,697 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.120e+01 8.643e+01 9.285e+01 1.025e+02 1.343e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-26 04:12:00,722 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3213560.0, ans=0.0 2023-11-26 04:12:11,436 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3213626.6666666665, ans=0.0 2023-11-26 04:12:12,544 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3213626.6666666665, ans=0.125 2023-11-26 04:12:16,603 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 482050 2023-11-26 04:12:22,936 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 1100, loss[loss=0.07425, simple_loss=0.1022, pruned_loss=0.01741, audio_tagging_loss=0.005756, over 14939.00 frames. ], tot_loss[loss=0.06684, simple_loss=0.0907, pruned_loss=0.01271, audio_tagging_loss=0.008776, over 3038829.39 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:12:25,123 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 04:12:35,733 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.10 vs. limit=15.0 2023-11-26 04:12:38,477 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3213760.0, ans=0.09899494936611666 2023-11-26 04:12:45,307 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3213826.6666666665, ans=0.125 2023-11-26 04:12:52,044 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.49 vs. limit=22.5 2023-11-26 04:12:55,818 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3213893.3333333335, ans=0.125 2023-11-26 04:13:05,873 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3213960.0, ans=10.0 2023-11-26 04:13:11,062 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 482100 2023-11-26 04:13:17,111 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3214026.6666666665, ans=0.125 2023-11-26 04:13:17,888 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 1150, loss[loss=0.05081, simple_loss=0.06731, pruned_loss=0.008072, audio_tagging_loss=0.009087, over 15566.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.09016, pruned_loss=0.01256, audio_tagging_loss=0.008832, over 3038151.46 frames. ], batch size: 60, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:13:22,440 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3214026.6666666665, ans=0.0 2023-11-26 04:13:37,719 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3214093.3333333335, ans=0.0 2023-11-26 04:13:39,462 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3214160.0, ans=0.125 2023-11-26 04:13:48,699 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.669e+01 8.737e+01 9.281e+01 9.829e+01 1.139e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-26 04:13:54,131 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3214226.6666666665, ans=0.125 2023-11-26 04:14:06,866 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 482150 2023-11-26 04:14:13,217 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 1200, loss[loss=0.05179, simple_loss=0.07208, pruned_loss=0.006392, audio_tagging_loss=0.009356, over 14752.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.0901, pruned_loss=0.01259, audio_tagging_loss=0.008745, over 3034909.97 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:14:27,351 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3214426.6666666665, ans=0.1 2023-11-26 04:14:35,874 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3214493.3333333335, ans=0.2 2023-11-26 04:14:43,191 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3214493.3333333335, ans=0.95 2023-11-26 04:14:49,632 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.05 vs. limit=12.0 2023-11-26 04:15:00,012 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3214626.6666666665, ans=0.125 2023-11-26 04:15:00,036 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3214626.6666666665, ans=0.2 2023-11-26 04:15:02,012 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 482200 2023-11-26 04:15:07,348 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3214626.6666666665, ans=0.1 2023-11-26 04:15:09,166 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 1250, loss[loss=0.06634, simple_loss=0.08892, pruned_loss=0.01338, audio_tagging_loss=0.008498, over 15431.00 frames. ], tot_loss[loss=0.06653, simple_loss=0.09012, pruned_loss=0.01269, audio_tagging_loss=0.008785, over 3034240.85 frames. ], batch size: 59, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:15:39,739 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.598e+01 8.848e+01 9.499e+01 1.001e+02 1.397e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-26 04:15:44,582 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.39 vs. limit=22.5 2023-11-26 04:15:57,551 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 482250 2023-11-26 04:16:00,815 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3214960.0, ans=0.1 2023-11-26 04:16:03,856 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 1300, loss[loss=0.07782, simple_loss=0.1119, pruned_loss=0.0127, audio_tagging_loss=0.009187, over 13877.00 frames. ], tot_loss[loss=0.06673, simple_loss=0.09052, pruned_loss=0.01274, audio_tagging_loss=0.008731, over 3032510.41 frames. ], batch size: 54, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:16:14,671 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3215093.3333333335, ans=10.0 2023-11-26 04:16:15,031 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.26 vs. limit=15.0 2023-11-26 04:16:29,570 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3215160.0, ans=0.125 2023-11-26 04:16:34,863 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3215160.0, ans=0.95 2023-11-26 04:16:34,945 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3215160.0, ans=0.1 2023-11-26 04:16:40,212 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3215226.6666666665, ans=0.0 2023-11-26 04:16:42,412 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3215226.6666666665, ans=0.125 2023-11-26 04:16:45,784 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.26 vs. limit=12.0 2023-11-26 04:16:53,397 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 482300 2023-11-26 04:16:53,600 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3215293.3333333335, ans=0.1 2023-11-26 04:17:00,349 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 1350, loss[loss=0.06047, simple_loss=0.07931, pruned_loss=0.01046, audio_tagging_loss=0.01035, over 14897.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.09024, pruned_loss=0.01266, audio_tagging_loss=0.008802, over 3039074.23 frames. ], batch size: 55, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:17:08,030 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3215360.0, ans=0.125 2023-11-26 04:17:08,439 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.51 vs. limit=22.5 2023-11-26 04:17:19,808 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3215426.6666666665, ans=0.125 2023-11-26 04:17:31,337 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.022e+01 8.487e+01 8.991e+01 9.732e+01 2.025e+02, threshold=1.798e+02, percent-clipped=1.0 2023-11-26 04:17:37,469 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3215560.0, ans=0.125 2023-11-26 04:17:41,001 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 04:17:47,781 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.72 vs. limit=10.0 2023-11-26 04:17:49,431 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 482350 2023-11-26 04:17:50,596 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3215626.6666666665, ans=0.0 2023-11-26 04:17:56,837 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 1400, loss[loss=0.06321, simple_loss=0.08237, pruned_loss=0.0109, audio_tagging_loss=0.01113, over 14760.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.09014, pruned_loss=0.01279, audio_tagging_loss=0.008837, over 3041457.09 frames. ], batch size: 57, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:18:09,654 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3215760.0, ans=0.0 2023-11-26 04:18:26,904 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3215826.6666666665, ans=0.125 2023-11-26 04:18:37,659 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3215893.3333333335, ans=0.125 2023-11-26 04:18:45,874 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 482400 2023-11-26 04:18:52,445 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 1450, loss[loss=0.07227, simple_loss=0.09152, pruned_loss=0.01524, audio_tagging_loss=0.01127, over 14296.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.08927, pruned_loss=0.01273, audio_tagging_loss=0.008877, over 3039558.19 frames. ], batch size: 55, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:18:57,791 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3216026.6666666665, ans=0.125 2023-11-26 04:19:17,570 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3216160.0, ans=0.0 2023-11-26 04:19:20,244 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3216160.0, ans=0.1 2023-11-26 04:19:24,157 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.356e+01 8.656e+01 9.210e+01 9.975e+01 1.432e+02, threshold=1.842e+02, percent-clipped=0.0 2023-11-26 04:19:35,114 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3216226.6666666665, ans=0.2 2023-11-26 04:19:41,264 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 482450 2023-11-26 04:19:48,065 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 1500, loss[loss=0.0699, simple_loss=0.08688, pruned_loss=0.01269, audio_tagging_loss=0.01376, over 15328.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.0898, pruned_loss=0.01268, audio_tagging_loss=0.008974, over 3037892.97 frames. ], batch size: 57, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:19:51,364 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3216360.0, ans=0.125 2023-11-26 04:19:54,531 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.78 vs. limit=15.0 2023-11-26 04:20:04,600 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3216426.6666666665, ans=0.125 2023-11-26 04:20:16,176 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3216493.3333333335, ans=0.125 2023-11-26 04:20:16,227 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3216493.3333333335, ans=0.1 2023-11-26 04:20:19,445 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3216493.3333333335, ans=0.125 2023-11-26 04:20:23,588 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3216560.0, ans=0.125 2023-11-26 04:20:32,673 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3216626.6666666665, ans=0.125 2023-11-26 04:20:37,475 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 482500 2023-11-26 04:20:44,830 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 1550, loss[loss=0.06925, simple_loss=0.09258, pruned_loss=0.01379, audio_tagging_loss=0.009169, over 14720.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.08955, pruned_loss=0.01261, audio_tagging_loss=0.009168, over 3031449.83 frames. ], batch size: 55, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:20:46,096 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3216693.3333333335, ans=0.125 2023-11-26 04:20:46,149 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3216693.3333333335, ans=0.125 2023-11-26 04:20:52,563 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3216693.3333333335, ans=0.125 2023-11-26 04:20:57,787 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3216760.0, ans=0.125 2023-11-26 04:21:02,005 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3216760.0, ans=10.0 2023-11-26 04:21:03,098 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3216760.0, ans=0.125 2023-11-26 04:21:05,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3216826.6666666665, ans=0.0 2023-11-26 04:21:15,038 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.470e+01 8.764e+01 9.258e+01 1.010e+02 1.215e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-26 04:21:22,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3216893.3333333335, ans=0.0 2023-11-26 04:21:22,726 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.16 vs. limit=12.0 2023-11-26 04:21:23,269 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3216893.3333333335, ans=0.0 2023-11-26 04:21:33,700 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 482550 2023-11-26 04:21:40,015 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 1600, loss[loss=0.06785, simple_loss=0.09405, pruned_loss=0.01228, audio_tagging_loss=0.008547, over 15293.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.08972, pruned_loss=0.01246, audio_tagging_loss=0.009199, over 3039655.21 frames. ], batch size: 59, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:21:44,457 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3217026.6666666665, ans=0.0 2023-11-26 04:21:54,651 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3217093.3333333335, ans=0.125 2023-11-26 04:22:28,859 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 482600 2023-11-26 04:22:36,041 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 1650, loss[loss=0.07362, simple_loss=0.09486, pruned_loss=0.01465, audio_tagging_loss=0.01154, over 17202.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.0899, pruned_loss=0.01239, audio_tagging_loss=0.009165, over 3046661.56 frames. ], batch size: 62, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:22:38,476 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 04:22:51,513 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.51 vs. limit=22.5 2023-11-26 04:22:54,222 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3217426.6666666665, ans=0.125 2023-11-26 04:23:07,727 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.278e+01 8.467e+01 9.120e+01 9.826e+01 1.173e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-26 04:23:19,256 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3217626.6666666665, ans=0.125 2023-11-26 04:23:21,434 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3217626.6666666665, ans=0.2 2023-11-26 04:23:24,375 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 482650 2023-11-26 04:23:27,654 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3217626.6666666665, ans=0.125 2023-11-26 04:23:28,086 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.93 vs. limit=15.0 2023-11-26 04:23:31,214 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 1700, loss[loss=0.06868, simple_loss=0.09539, pruned_loss=0.01073, audio_tagging_loss=0.01026, over 15871.00 frames. ], tot_loss[loss=0.06721, simple_loss=0.09093, pruned_loss=0.01259, audio_tagging_loss=0.009164, over 3049314.28 frames. ], batch size: 59, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:23:35,081 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3217693.3333333335, ans=0.2 2023-11-26 04:24:00,355 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.31 vs. limit=15.0 2023-11-26 04:24:09,453 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3217893.3333333335, ans=0.1 2023-11-26 04:24:20,455 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 482700 2023-11-26 04:24:24,806 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3217960.0, ans=0.07 2023-11-26 04:24:26,772 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 1750, loss[loss=0.04163, simple_loss=0.05419, pruned_loss=0.005078, audio_tagging_loss=0.009457, over 13771.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.09043, pruned_loss=0.01233, audio_tagging_loss=0.009109, over 3050130.00 frames. ], batch size: 54, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:24:47,557 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3218160.0, ans=0.5 2023-11-26 04:24:59,627 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.364e+01 8.710e+01 9.428e+01 1.004e+02 1.247e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-26 04:25:12,459 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3218293.3333333335, ans=0.0 2023-11-26 04:25:14,555 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3218293.3333333335, ans=0.1 2023-11-26 04:25:15,536 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 482750 2023-11-26 04:25:22,315 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 1800, loss[loss=0.05404, simple_loss=0.07581, pruned_loss=0.006622, audio_tagging_loss=0.009519, over 15857.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.09007, pruned_loss=0.01223, audio_tagging_loss=0.009019, over 3052304.47 frames. ], batch size: 58, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:25:25,817 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3218360.0, ans=0.125 2023-11-26 04:25:35,302 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3218426.6666666665, ans=0.1 2023-11-26 04:26:07,557 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3218626.6666666665, ans=0.0 2023-11-26 04:26:09,705 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3218626.6666666665, ans=0.07 2023-11-26 04:26:10,773 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3218626.6666666665, ans=0.95 2023-11-26 04:26:11,627 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 482800 2023-11-26 04:26:13,093 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3218626.6666666665, ans=10.0 2023-11-26 04:26:15,223 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3218626.6666666665, ans=0.125 2023-11-26 04:26:18,153 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 1850, loss[loss=0.0753, simple_loss=0.1087, pruned_loss=0.01262, audio_tagging_loss=0.008336, over 15528.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.08919, pruned_loss=0.01226, audio_tagging_loss=0.009013, over 3044961.08 frames. ], batch size: 59, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:26:28,599 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3218693.3333333335, ans=0.125 2023-11-26 04:26:39,408 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3218760.0, ans=0.2 2023-11-26 04:26:51,386 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.983e+01 8.663e+01 9.346e+01 1.025e+02 1.313e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-26 04:26:51,724 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3218893.3333333335, ans=0.125 2023-11-26 04:26:53,800 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3218893.3333333335, ans=0.0 2023-11-26 04:27:05,454 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3218960.0, ans=0.5 2023-11-26 04:27:08,528 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 482850 2023-11-26 04:27:15,284 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 1900, loss[loss=0.06193, simple_loss=0.08844, pruned_loss=0.009624, audio_tagging_loss=0.008087, over 15488.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08902, pruned_loss=0.01222, audio_tagging_loss=0.008928, over 3051682.62 frames. ], batch size: 57, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:27:35,226 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3219093.3333333335, ans=0.0 2023-11-26 04:27:35,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3219093.3333333335, ans=0.2 2023-11-26 04:27:47,929 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3219226.6666666665, ans=0.125 2023-11-26 04:27:48,858 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=3219226.6666666665, ans=0.5 2023-11-26 04:27:52,181 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3219226.6666666665, ans=0.125 2023-11-26 04:27:56,940 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3219226.6666666665, ans=0.2 2023-11-26 04:28:04,385 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 482900 2023-11-26 04:28:07,846 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3219293.3333333335, ans=0.0 2023-11-26 04:28:08,781 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3219293.3333333335, ans=0.07 2023-11-26 04:28:09,913 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3219360.0, ans=0.0 2023-11-26 04:28:11,346 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 1950, loss[loss=0.06945, simple_loss=0.08905, pruned_loss=0.01323, audio_tagging_loss=0.01169, over 14289.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.08916, pruned_loss=0.01215, audio_tagging_loss=0.008924, over 3054152.58 frames. ], batch size: 55, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:28:43,714 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.058e+01 8.481e+01 9.198e+01 9.869e+01 1.193e+02, threshold=1.840e+02, percent-clipped=0.0 2023-11-26 04:28:55,137 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3219626.6666666665, ans=0.125 2023-11-26 04:29:00,263 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 482950 2023-11-26 04:29:04,661 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3219626.6666666665, ans=0.125 2023-11-26 04:29:06,468 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 2000, loss[loss=0.07036, simple_loss=0.1069, pruned_loss=0.01076, audio_tagging_loss=0.006142, over 15425.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08872, pruned_loss=0.01212, audio_tagging_loss=0.008896, over 3047608.27 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:29:06,715 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3219693.3333333335, ans=0.125 2023-11-26 04:29:12,464 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.89 vs. limit=15.0 2023-11-26 04:29:17,973 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.70 vs. limit=22.5 2023-11-26 04:29:36,162 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3219826.6666666665, ans=0.125 2023-11-26 04:29:42,201 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3219893.3333333335, ans=0.0 2023-11-26 04:29:56,794 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 483000 2023-11-26 04:30:03,468 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 2050, loss[loss=0.06889, simple_loss=0.0948, pruned_loss=0.01345, audio_tagging_loss=0.008042, over 14845.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.08947, pruned_loss=0.01218, audio_tagging_loss=0.008872, over 3041684.17 frames. ], batch size: 53, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:30:23,062 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3220093.3333333335, ans=0.0 2023-11-26 04:30:36,115 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.199e+01 8.605e+01 9.268e+01 1.003e+02 1.182e+02, threshold=1.854e+02, percent-clipped=0.0 2023-11-26 04:30:43,311 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3220226.6666666665, ans=0.125 2023-11-26 04:30:43,433 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3220226.6666666665, ans=0.125 2023-11-26 04:30:53,353 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 483050 2023-11-26 04:30:55,717 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3220293.3333333335, ans=0.125 2023-11-26 04:30:59,846 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 2100, loss[loss=0.06028, simple_loss=0.08292, pruned_loss=0.007952, audio_tagging_loss=0.01087, over 14594.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.0896, pruned_loss=0.01214, audio_tagging_loss=0.008829, over 3040932.38 frames. ], batch size: 55, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:31:14,336 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3220426.6666666665, ans=0.125 2023-11-26 04:31:19,147 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3220426.6666666665, ans=0.125 2023-11-26 04:31:25,863 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3220493.3333333335, ans=0.125 2023-11-26 04:31:46,009 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3220626.6666666665, ans=0.1 2023-11-26 04:31:49,117 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 483100 2023-11-26 04:31:55,363 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 2150, loss[loss=0.08455, simple_loss=0.1168, pruned_loss=0.01765, audio_tagging_loss=0.008469, over 15793.00 frames. ], tot_loss[loss=0.06653, simple_loss=0.09038, pruned_loss=0.01243, audio_tagging_loss=0.008913, over 3041442.76 frames. ], batch size: 58, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:31:59,875 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3220693.3333333335, ans=0.125 2023-11-26 04:32:10,706 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3220760.0, ans=0.125 2023-11-26 04:32:19,272 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3220826.6666666665, ans=0.125 2023-11-26 04:32:28,584 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.889e+01 8.606e+01 9.465e+01 1.020e+02 1.219e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-26 04:32:29,700 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 04:32:31,097 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3220893.3333333335, ans=0.0 2023-11-26 04:32:42,904 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3220960.0, ans=0.0 2023-11-26 04:32:45,730 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 483150 2023-11-26 04:32:51,160 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 04:32:52,094 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 2200, loss[loss=0.07802, simple_loss=0.1103, pruned_loss=0.01464, audio_tagging_loss=0.008245, over 15320.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.08961, pruned_loss=0.01229, audio_tagging_loss=0.008887, over 3029841.69 frames. ], batch size: 58, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:32:54,845 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.62 vs. limit=22.5 2023-11-26 04:33:05,694 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 04:33:08,966 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3221093.3333333335, ans=0.0 2023-11-26 04:33:11,548 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3221093.3333333335, ans=0.0 2023-11-26 04:33:12,553 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3221093.3333333335, ans=0.125 2023-11-26 04:33:18,962 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3221160.0, ans=0.125 2023-11-26 04:33:20,071 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3221160.0, ans=0.125 2023-11-26 04:33:30,734 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3221226.6666666665, ans=0.125 2023-11-26 04:33:41,022 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 483200 2023-11-26 04:33:44,666 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3221293.3333333335, ans=0.0 2023-11-26 04:33:47,556 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 2250, loss[loss=0.07048, simple_loss=0.09611, pruned_loss=0.01323, audio_tagging_loss=0.009195, over 14941.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.09024, pruned_loss=0.01221, audio_tagging_loss=0.008824, over 3036516.74 frames. ], batch size: 58, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:34:13,384 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3221493.3333333335, ans=0.05 2023-11-26 04:34:21,559 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.244e+01 8.817e+01 9.211e+01 9.808e+01 1.275e+02, threshold=1.842e+02, percent-clipped=0.0 2023-11-26 04:34:22,886 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3221560.0, ans=0.0 2023-11-26 04:34:30,213 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3221560.0, ans=0.2 2023-11-26 04:34:35,128 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3221626.6666666665, ans=0.125 2023-11-26 04:34:37,731 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 483250 2023-11-26 04:34:41,465 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.26 vs. limit=15.0 2023-11-26 04:34:44,126 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 2300, loss[loss=0.07368, simple_loss=0.09773, pruned_loss=0.01456, audio_tagging_loss=0.01026, over 15567.00 frames. ], tot_loss[loss=0.06676, simple_loss=0.09107, pruned_loss=0.01244, audio_tagging_loss=0.008788, over 3039114.93 frames. ], batch size: 58, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:34:55,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3221760.0, ans=0.1 2023-11-26 04:35:05,833 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.98 vs. limit=15.0 2023-11-26 04:35:16,436 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3221893.3333333335, ans=0.0 2023-11-26 04:35:27,939 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.96 vs. limit=15.0 2023-11-26 04:35:30,628 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3221960.0, ans=0.1 2023-11-26 04:35:32,585 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 04:35:32,634 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 483300 2023-11-26 04:35:40,082 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 2350, loss[loss=0.05982, simple_loss=0.07782, pruned_loss=0.01082, audio_tagging_loss=0.01009, over 15389.00 frames. ], tot_loss[loss=0.06729, simple_loss=0.09189, pruned_loss=0.01251, audio_tagging_loss=0.008837, over 3037486.83 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 04:35:41,465 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3222026.6666666665, ans=0.125 2023-11-26 04:36:13,184 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.504e+01 8.775e+01 9.413e+01 9.957e+01 1.252e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-26 04:36:14,606 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3222226.6666666665, ans=0.125 2023-11-26 04:36:17,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3222226.6666666665, ans=0.04949747468305833 2023-11-26 04:36:29,221 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 483350 2023-11-26 04:36:32,625 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3222293.3333333335, ans=0.125 2023-11-26 04:36:34,094 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.70 vs. limit=15.0 2023-11-26 04:36:35,664 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 2400, loss[loss=0.08048, simple_loss=0.1141, pruned_loss=0.01884, audio_tagging_loss=0.004591, over 15574.00 frames. ], tot_loss[loss=0.06751, simple_loss=0.09202, pruned_loss=0.01269, audio_tagging_loss=0.008814, over 3043361.44 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:36:44,926 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3222360.0, ans=0.0 2023-11-26 04:36:46,722 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.34 vs. limit=15.0 2023-11-26 04:36:47,953 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.49 vs. limit=15.0 2023-11-26 04:36:59,732 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3222493.3333333335, ans=0.0 2023-11-26 04:37:02,403 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3222493.3333333335, ans=0.125 2023-11-26 04:37:18,554 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.42 vs. limit=12.0 2023-11-26 04:37:20,352 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3222626.6666666665, ans=0.125 2023-11-26 04:37:24,486 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 483400 2023-11-26 04:37:32,122 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 2450, loss[loss=0.04176, simple_loss=0.05342, pruned_loss=0.006022, audio_tagging_loss=0.009025, over 16029.00 frames. ], tot_loss[loss=0.06706, simple_loss=0.09121, pruned_loss=0.01256, audio_tagging_loss=0.008895, over 3052767.20 frames. ], batch size: 62, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:37:34,559 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3222693.3333333335, ans=0.0 2023-11-26 04:37:37,853 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.89 vs. limit=15.0 2023-11-26 04:37:59,900 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3222826.6666666665, ans=0.0 2023-11-26 04:38:05,005 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.815e+01 8.820e+01 9.460e+01 9.914e+01 1.229e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-26 04:38:13,119 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3222893.3333333335, ans=0.125 2023-11-26 04:38:16,924 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3222960.0, ans=10.0 2023-11-26 04:38:20,993 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 483450 2023-11-26 04:38:25,822 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3222960.0, ans=0.125 2023-11-26 04:38:28,468 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 2500, loss[loss=0.09363, simple_loss=0.1321, pruned_loss=0.02194, audio_tagging_loss=0.005647, over 15248.00 frames. ], tot_loss[loss=0.06756, simple_loss=0.09177, pruned_loss=0.01276, audio_tagging_loss=0.008921, over 3051477.82 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:39:08,586 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3223226.6666666665, ans=0.125 2023-11-26 04:39:17,448 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 483500 2023-11-26 04:39:23,663 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 2550, loss[loss=0.05585, simple_loss=0.07407, pruned_loss=0.009968, audio_tagging_loss=0.008846, over 14871.00 frames. ], tot_loss[loss=0.067, simple_loss=0.09093, pruned_loss=0.01264, audio_tagging_loss=0.0089, over 3047171.56 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:39:40,353 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3223426.6666666665, ans=0.125 2023-11-26 04:39:58,037 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.362e+01 8.655e+01 9.369e+01 9.898e+01 1.233e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-26 04:39:58,358 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3223560.0, ans=0.2 2023-11-26 04:40:05,815 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3223560.0, ans=0.125 2023-11-26 04:40:06,850 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3223560.0, ans=0.1 2023-11-26 04:40:13,224 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 483550 2023-11-26 04:40:20,119 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 2600, loss[loss=0.0486, simple_loss=0.06668, pruned_loss=0.005402, audio_tagging_loss=0.009857, over 14928.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.08978, pruned_loss=0.01245, audio_tagging_loss=0.008864, over 3050049.82 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:40:27,192 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3223693.3333333335, ans=0.0 2023-11-26 04:40:29,445 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3223693.3333333335, ans=0.0 2023-11-26 04:41:09,395 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 483600 2023-11-26 04:41:17,196 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 2650, loss[loss=0.07871, simple_loss=0.118, pruned_loss=0.0135, audio_tagging_loss=0.006236, over 15218.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.09014, pruned_loss=0.01261, audio_tagging_loss=0.008733, over 3049454.01 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:41:24,829 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.39 vs. limit=10.0 2023-11-26 04:41:39,219 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3224160.0, ans=0.0 2023-11-26 04:41:40,219 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3224160.0, ans=0.125 2023-11-26 04:41:50,091 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.141e+01 8.492e+01 9.203e+01 1.002e+02 1.237e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-26 04:42:06,582 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 483650 2023-11-26 04:42:12,945 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 2700, loss[loss=0.05948, simple_loss=0.08352, pruned_loss=0.01035, audio_tagging_loss=0.007374, over 15165.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.08991, pruned_loss=0.0125, audio_tagging_loss=0.008718, over 3055458.38 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 04:42:13,214 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3224360.0, ans=0.1 2023-11-26 04:42:15,333 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3224360.0, ans=0.125 2023-11-26 04:42:25,989 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3224426.6666666665, ans=0.0 2023-11-26 04:42:28,633 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3224426.6666666665, ans=0.125 2023-11-26 04:42:45,348 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3224493.3333333335, ans=0.125 2023-11-26 04:42:49,627 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3224560.0, ans=0.0 2023-11-26 04:42:49,725 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3224560.0, ans=0.0 2023-11-26 04:42:56,027 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3224560.0, ans=0.0 2023-11-26 04:43:02,275 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 483700 2023-11-26 04:43:08,495 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 2750, loss[loss=0.06292, simple_loss=0.09172, pruned_loss=0.007593, audio_tagging_loss=0.009465, over 16426.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.09032, pruned_loss=0.01265, audio_tagging_loss=0.00875, over 3058823.57 frames. ], batch size: 61, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 04:43:31,086 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3224826.6666666665, ans=0.1 2023-11-26 04:43:43,683 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.484e+01 8.913e+01 9.370e+01 9.874e+01 1.312e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-26 04:43:47,120 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3224893.3333333335, ans=0.0 2023-11-26 04:43:49,210 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 04:43:55,859 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 04:43:57,991 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 483750 2023-11-26 04:44:04,798 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 2800, loss[loss=0.06633, simple_loss=0.08167, pruned_loss=0.01425, audio_tagging_loss=0.01125, over 14800.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.09015, pruned_loss=0.01261, audio_tagging_loss=0.008703, over 3053543.83 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:44:14,587 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3225026.6666666665, ans=0.0 2023-11-26 04:44:17,895 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3225093.3333333335, ans=0.0 2023-11-26 04:44:19,133 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.80 vs. limit=22.5 2023-11-26 04:44:20,042 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3225093.3333333335, ans=0.0 2023-11-26 04:44:32,857 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3225160.0, ans=0.125 2023-11-26 04:44:32,858 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3225160.0, ans=0.1 2023-11-26 04:44:35,751 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.17 vs. limit=22.5 2023-11-26 04:44:36,626 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 04:44:37,760 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3225226.6666666665, ans=0.125 2023-11-26 04:44:44,031 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3225226.6666666665, ans=0.125 2023-11-26 04:44:55,136 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 483800 2023-11-26 04:45:01,016 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3225360.0, ans=0.125 2023-11-26 04:45:01,806 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 2850, loss[loss=0.0651, simple_loss=0.08404, pruned_loss=0.01329, audio_tagging_loss=0.009789, over 15100.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08897, pruned_loss=0.01238, audio_tagging_loss=0.008672, over 3040988.67 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:45:05,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3225360.0, ans=0.0 2023-11-26 04:45:11,510 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3225426.6666666665, ans=0.125 2023-11-26 04:45:36,994 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.154e+01 8.846e+01 9.347e+01 1.008e+02 1.244e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-26 04:45:40,875 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.35 vs. limit=12.0 2023-11-26 04:45:48,952 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3225626.6666666665, ans=0.125 2023-11-26 04:45:50,801 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 483850 2023-11-26 04:45:50,873 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3225626.6666666665, ans=0.2 2023-11-26 04:45:57,141 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 2900, loss[loss=0.05695, simple_loss=0.07557, pruned_loss=0.008792, audio_tagging_loss=0.01037, over 16696.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.08992, pruned_loss=0.0124, audio_tagging_loss=0.008648, over 3047281.37 frames. ], batch size: 64, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:46:00,341 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.68 vs. limit=15.0 2023-11-26 04:46:02,546 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.10 vs. limit=22.5 2023-11-26 04:46:09,506 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3225760.0, ans=0.1 2023-11-26 04:46:30,916 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.28 vs. limit=15.0 2023-11-26 04:46:46,738 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 483900 2023-11-26 04:46:47,903 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3225960.0, ans=0.125 2023-11-26 04:46:52,985 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 2950, loss[loss=0.05642, simple_loss=0.076, pruned_loss=0.009925, audio_tagging_loss=0.008499, over 15643.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.09018, pruned_loss=0.01244, audio_tagging_loss=0.00868, over 3048095.56 frames. ], batch size: 60, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:46:53,612 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.15 vs. limit=15.0 2023-11-26 04:47:02,611 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3226026.6666666665, ans=0.0 2023-11-26 04:47:17,297 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.60 vs. limit=10.0 2023-11-26 04:47:21,221 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3226160.0, ans=0.1 2023-11-26 04:47:27,901 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.443e+01 8.828e+01 9.406e+01 1.023e+02 1.338e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-26 04:47:29,612 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.03 vs. limit=15.0 2023-11-26 04:47:40,657 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.52 vs. limit=22.5 2023-11-26 04:47:40,724 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.92 vs. limit=15.0 2023-11-26 04:47:42,887 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 483950 2023-11-26 04:47:47,965 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3226293.3333333335, ans=0.0 2023-11-26 04:47:49,771 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 3000, loss[loss=0.04655, simple_loss=0.05227, pruned_loss=0.009713, audio_tagging_loss=0.0107, over 14803.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.08962, pruned_loss=0.01237, audio_tagging_loss=0.008792, over 3044183.47 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:47:49,773 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-26 04:48:04,463 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([6.0660, 5.8246, 5.6125, 5.5745], device='cuda:0') 2023-11-26 04:48:22,233 INFO [train_asr.py:1267] (0/4) Epoch 41, validation: loss=0.05755, simple_loss=0.05064, pruned_loss=0.005227, audio_tagging_loss=0.02701, over 4681554.00 frames. 2023-11-26 04:48:22,233 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-26 04:48:23,942 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.20 vs. limit=15.0 2023-11-26 04:49:00,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3226560.0, ans=0.125 2023-11-26 04:49:11,257 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 484000 2023-11-26 04:49:12,562 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-484000.pt 2023-11-26 04:49:20,475 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 3050, loss[loss=0.0842, simple_loss=0.121, pruned_loss=0.01738, audio_tagging_loss=0.006337, over 15513.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.0898, pruned_loss=0.01234, audio_tagging_loss=0.008832, over 3044538.42 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:49:26,541 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3226693.3333333335, ans=0.125 2023-11-26 04:49:36,746 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3226760.0, ans=0.125 2023-11-26 04:49:49,988 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3226826.6666666665, ans=0.125 2023-11-26 04:49:52,010 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 04:49:54,393 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3226893.3333333335, ans=0.125 2023-11-26 04:49:55,239 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.335e+01 8.733e+01 9.255e+01 1.004e+02 1.259e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-26 04:49:55,683 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.16 vs. limit=22.5 2023-11-26 04:50:03,622 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3226893.3333333335, ans=0.0 2023-11-26 04:50:07,952 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3226960.0, ans=0.1 2023-11-26 04:50:10,294 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 484050 2023-11-26 04:50:17,057 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 3100, loss[loss=0.06685, simple_loss=0.094, pruned_loss=0.01114, audio_tagging_loss=0.008712, over 15873.00 frames. ], tot_loss[loss=0.06685, simple_loss=0.09077, pruned_loss=0.01265, audio_tagging_loss=0.008809, over 3047921.68 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:50:18,440 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3227026.6666666665, ans=0.125 2023-11-26 04:50:24,743 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3227026.6666666665, ans=0.125 2023-11-26 04:50:38,129 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3227160.0, ans=0.2 2023-11-26 04:50:50,370 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3227226.6666666665, ans=0.0 2023-11-26 04:50:52,906 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff3.min_abs, batch_count=3227226.6666666665, ans=0.2 2023-11-26 04:51:06,106 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 484100 2023-11-26 04:51:12,218 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.52 vs. limit=15.0 2023-11-26 04:51:12,455 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 3150, loss[loss=0.06859, simple_loss=0.0969, pruned_loss=0.01181, audio_tagging_loss=0.008331, over 15815.00 frames. ], tot_loss[loss=0.06693, simple_loss=0.09095, pruned_loss=0.01257, audio_tagging_loss=0.008892, over 3049655.05 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 04:51:18,470 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 04:51:19,700 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3227360.0, ans=0.125 2023-11-26 04:51:20,144 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.13 vs. limit=15.0 2023-11-26 04:51:25,160 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3227426.6666666665, ans=0.125 2023-11-26 04:51:32,034 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3227426.6666666665, ans=0.1 2023-11-26 04:51:41,454 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3227493.3333333335, ans=0.1 2023-11-26 04:51:48,598 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.901e+01 8.686e+01 9.278e+01 1.012e+02 1.304e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-26 04:51:49,885 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3227560.0, ans=0.125 2023-11-26 04:51:55,238 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3227560.0, ans=0.0 2023-11-26 04:52:01,922 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 484150 2023-11-26 04:52:08,290 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 3200, loss[loss=0.08234, simple_loss=0.1164, pruned_loss=0.01679, audio_tagging_loss=0.007369, over 16093.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.09042, pruned_loss=0.01229, audio_tagging_loss=0.008988, over 3056864.27 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:52:14,219 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3227693.3333333335, ans=0.125 2023-11-26 04:52:19,230 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3227760.0, ans=0.125 2023-11-26 04:52:34,471 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3227826.6666666665, ans=0.2 2023-11-26 04:52:40,475 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3227826.6666666665, ans=0.0 2023-11-26 04:52:48,780 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3227893.3333333335, ans=0.125 2023-11-26 04:52:56,062 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.53 vs. limit=6.0 2023-11-26 04:52:57,685 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 484200 2023-11-26 04:53:04,805 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 3250, loss[loss=0.06239, simple_loss=0.07682, pruned_loss=0.009597, audio_tagging_loss=0.01439, over 15419.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.09038, pruned_loss=0.01219, audio_tagging_loss=0.009174, over 3053330.66 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:53:13,911 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3228026.6666666665, ans=0.2 2023-11-26 04:53:28,428 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3228160.0, ans=0.0 2023-11-26 04:53:32,577 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3228160.0, ans=0.125 2023-11-26 04:53:38,524 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3228226.6666666665, ans=0.125 2023-11-26 04:53:40,492 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.780e+01 8.676e+01 9.295e+01 9.800e+01 1.223e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-26 04:53:44,481 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3228226.6666666665, ans=0.1 2023-11-26 04:53:54,374 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 484250 2023-11-26 04:54:00,676 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 3300, loss[loss=0.06048, simple_loss=0.08198, pruned_loss=0.009889, audio_tagging_loss=0.009594, over 15512.00 frames. ], tot_loss[loss=0.0664, simple_loss=0.09007, pruned_loss=0.01221, audio_tagging_loss=0.009163, over 3051050.73 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:54:07,919 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3228360.0, ans=0.125 2023-11-26 04:54:09,400 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.70 vs. limit=10.0 2023-11-26 04:54:13,220 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3228426.6666666665, ans=0.0 2023-11-26 04:54:20,376 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.80 vs. limit=15.0 2023-11-26 04:54:21,424 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.17 vs. limit=15.0 2023-11-26 04:54:29,444 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.29 vs. limit=15.0 2023-11-26 04:54:38,314 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.38 vs. limit=22.5 2023-11-26 04:54:44,544 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3228626.6666666665, ans=0.05 2023-11-26 04:54:50,298 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 484300 2023-11-26 04:54:56,667 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 3350, loss[loss=0.06528, simple_loss=0.08063, pruned_loss=0.01508, audio_tagging_loss=0.009884, over 15539.00 frames. ], tot_loss[loss=0.06697, simple_loss=0.09109, pruned_loss=0.01244, audio_tagging_loss=0.008987, over 3057398.15 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:55:04,731 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=3228693.3333333335, ans=0.5 2023-11-26 04:55:06,374 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.18 vs. limit=22.5 2023-11-26 04:55:09,963 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.17 vs. limit=15.0 2023-11-26 04:55:10,565 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3228760.0, ans=0.5 2023-11-26 04:55:22,822 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3228826.6666666665, ans=0.125 2023-11-26 04:55:31,062 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.88 vs. limit=12.0 2023-11-26 04:55:32,662 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.567e+01 8.894e+01 9.635e+01 1.028e+02 1.225e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-26 04:55:46,128 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 484350 2023-11-26 04:55:49,922 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3228960.0, ans=0.0 2023-11-26 04:55:52,835 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 3400, loss[loss=0.0691, simple_loss=0.09243, pruned_loss=0.01328, audio_tagging_loss=0.0096, over 14785.00 frames. ], tot_loss[loss=0.06687, simple_loss=0.09116, pruned_loss=0.01244, audio_tagging_loss=0.008847, over 3055648.77 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:55:55,171 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3229026.6666666665, ans=0.125 2023-11-26 04:56:41,985 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 484400 2023-11-26 04:56:44,419 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3229293.3333333335, ans=0.1 2023-11-26 04:56:49,094 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 3450, loss[loss=0.07815, simple_loss=0.1153, pruned_loss=0.01302, audio_tagging_loss=0.007498, over 14991.00 frames. ], tot_loss[loss=0.06702, simple_loss=0.09115, pruned_loss=0.01269, audio_tagging_loss=0.008746, over 3044913.55 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:56:52,948 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.06 vs. limit=15.0 2023-11-26 04:56:57,424 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 04:57:01,683 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3229426.6666666665, ans=0.5 2023-11-26 04:57:03,069 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.47 vs. limit=22.5 2023-11-26 04:57:08,332 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.22 vs. limit=15.0 2023-11-26 04:57:08,440 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.25 vs. limit=12.0 2023-11-26 04:57:13,957 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3229493.3333333335, ans=0.0 2023-11-26 04:57:24,870 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.793e+01 8.632e+01 9.209e+01 1.007e+02 1.265e+02, threshold=1.842e+02, percent-clipped=0.0 2023-11-26 04:57:38,932 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 484450 2023-11-26 04:57:45,201 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 3500, loss[loss=0.07793, simple_loss=0.1017, pruned_loss=0.01741, audio_tagging_loss=0.009654, over 15012.00 frames. ], tot_loss[loss=0.0668, simple_loss=0.09079, pruned_loss=0.01266, audio_tagging_loss=0.008742, over 3047001.73 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:57:59,230 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3229760.0, ans=0.0 2023-11-26 04:58:08,601 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3229826.6666666665, ans=0.125 2023-11-26 04:58:08,762 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3229826.6666666665, ans=0.0 2023-11-26 04:58:09,782 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3229826.6666666665, ans=0.0 2023-11-26 04:58:12,778 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 04:58:23,675 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3229893.3333333335, ans=0.2 2023-11-26 04:58:31,294 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3229960.0, ans=0.05 2023-11-26 04:58:31,490 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.86 vs. limit=15.0 2023-11-26 04:58:34,880 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 484500 2023-11-26 04:58:41,683 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 3550, loss[loss=0.05513, simple_loss=0.06907, pruned_loss=0.01062, audio_tagging_loss=0.009975, over 14966.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.09037, pruned_loss=0.01251, audio_tagging_loss=0.008729, over 3046667.60 frames. ], batch size: 60, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 04:58:41,990 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3230026.6666666665, ans=0.0 2023-11-26 04:58:41,992 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3230026.6666666665, ans=0.125 2023-11-26 04:58:43,965 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3230026.6666666665, ans=0.125 2023-11-26 04:59:02,043 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.61 vs. limit=15.0 2023-11-26 04:59:07,568 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3230160.0, ans=0.0 2023-11-26 04:59:12,871 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3230160.0, ans=0.07 2023-11-26 04:59:17,666 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3230226.6666666665, ans=0.0 2023-11-26 04:59:18,488 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.656e+01 8.467e+01 9.253e+01 9.852e+01 1.320e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-26 04:59:23,478 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3230226.6666666665, ans=0.125 2023-11-26 04:59:30,883 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 484550 2023-11-26 04:59:35,794 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.37 vs. limit=15.0 2023-11-26 04:59:37,133 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 3600, loss[loss=0.06509, simple_loss=0.09161, pruned_loss=0.01275, audio_tagging_loss=0.006535, over 13707.00 frames. ], tot_loss[loss=0.06653, simple_loss=0.09047, pruned_loss=0.01256, audio_tagging_loss=0.008733, over 3050145.68 frames. ], batch size: 52, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:59:47,007 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3230360.0, ans=0.125 2023-11-26 05:00:00,247 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3230493.3333333335, ans=0.0 2023-11-26 05:00:03,684 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.21 vs. limit=22.5 2023-11-26 05:00:15,523 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3230560.0, ans=0.2 2023-11-26 05:00:19,827 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3230560.0, ans=0.125 2023-11-26 05:00:22,861 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3230626.6666666665, ans=0.125 2023-11-26 05:00:25,125 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3230626.6666666665, ans=0.0 2023-11-26 05:00:25,934 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 484600 2023-11-26 05:00:26,084 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3230626.6666666665, ans=0.1 2023-11-26 05:00:32,451 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=3230693.3333333335, ans=22.5 2023-11-26 05:00:32,959 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 3650, loss[loss=0.06028, simple_loss=0.08192, pruned_loss=0.01131, audio_tagging_loss=0.008011, over 15083.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.09057, pruned_loss=0.01256, audio_tagging_loss=0.00865, over 3052496.41 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:00:33,263 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3230693.3333333335, ans=0.2 2023-11-26 05:00:47,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3230760.0, ans=0.125 2023-11-26 05:01:08,864 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.138e+01 8.921e+01 9.497e+01 1.030e+02 1.167e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-26 05:01:13,373 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3230893.3333333335, ans=0.125 2023-11-26 05:01:18,059 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.27 vs. limit=15.0 2023-11-26 05:01:21,812 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 484650 2023-11-26 05:01:28,665 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 3700, loss[loss=0.06458, simple_loss=0.08932, pruned_loss=0.01234, audio_tagging_loss=0.007578, over 16548.00 frames. ], tot_loss[loss=0.06715, simple_loss=0.09149, pruned_loss=0.01277, audio_tagging_loss=0.008638, over 3053219.74 frames. ], batch size: 66, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:01:32,604 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3231026.6666666665, ans=0.125 2023-11-26 05:02:04,557 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3231226.6666666665, ans=0.125 2023-11-26 05:02:18,219 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 484700 2023-11-26 05:02:24,648 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 3750, loss[loss=0.06695, simple_loss=0.09017, pruned_loss=0.01396, audio_tagging_loss=0.007909, over 14479.00 frames. ], tot_loss[loss=0.06732, simple_loss=0.09171, pruned_loss=0.01282, audio_tagging_loss=0.008646, over 3054583.32 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:02:25,260 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.06 vs. limit=15.0 2023-11-26 05:02:43,975 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3231426.6666666665, ans=0.05 2023-11-26 05:02:45,023 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3231426.6666666665, ans=0.125 2023-11-26 05:02:48,210 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3231493.3333333335, ans=0.0 2023-11-26 05:02:51,924 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3231493.3333333335, ans=0.125 2023-11-26 05:02:55,686 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 05:03:02,808 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.701e+01 8.844e+01 9.429e+01 1.038e+02 1.452e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-26 05:03:02,859 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 05:03:10,465 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3231626.6666666665, ans=0.0 2023-11-26 05:03:13,393 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 484750 2023-11-26 05:03:13,497 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3231626.6666666665, ans=0.0 2023-11-26 05:03:20,218 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 3800, loss[loss=0.05783, simple_loss=0.08242, pruned_loss=0.00816, audio_tagging_loss=0.008461, over 14429.00 frames. ], tot_loss[loss=0.06769, simple_loss=0.09212, pruned_loss=0.0129, audio_tagging_loss=0.008724, over 3057581.83 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:03:30,706 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.27 vs. limit=15.0 2023-11-26 05:03:40,446 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.82 vs. limit=15.0 2023-11-26 05:03:42,323 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.77 vs. limit=10.0 2023-11-26 05:03:56,904 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3231893.3333333335, ans=0.125 2023-11-26 05:04:07,020 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.11 vs. limit=15.0 2023-11-26 05:04:09,525 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 484800 2023-11-26 05:04:16,584 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 3850, loss[loss=0.05645, simple_loss=0.06757, pruned_loss=0.009039, audio_tagging_loss=0.01363, over 15926.00 frames. ], tot_loss[loss=0.06704, simple_loss=0.09104, pruned_loss=0.0127, audio_tagging_loss=0.008811, over 3056287.07 frames. ], batch size: 60, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:04:25,694 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3232026.6666666665, ans=0.09899494936611666 2023-11-26 05:04:31,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3232093.3333333335, ans=0.125 2023-11-26 05:04:31,340 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2.whitening_limit, batch_count=3232093.3333333335, ans=15.0 2023-11-26 05:04:33,258 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3232093.3333333335, ans=0.125 2023-11-26 05:04:54,117 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.559e+01 8.842e+01 9.367e+01 1.019e+02 1.484e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-26 05:05:05,897 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 484850 2023-11-26 05:05:12,185 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 3900, loss[loss=0.07262, simple_loss=0.09324, pruned_loss=0.01412, audio_tagging_loss=0.01187, over 14878.00 frames. ], tot_loss[loss=0.06663, simple_loss=0.09019, pruned_loss=0.01263, audio_tagging_loss=0.008908, over 3046628.46 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:05:14,432 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3232360.0, ans=0.0 2023-11-26 05:05:17,816 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3232360.0, ans=0.07 2023-11-26 05:05:22,973 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3232426.6666666665, ans=0.1 2023-11-26 05:05:30,062 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3232426.6666666665, ans=0.125 2023-11-26 05:05:40,906 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3232493.3333333335, ans=0.2 2023-11-26 05:05:51,282 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.11 vs. limit=10.0 2023-11-26 05:05:52,016 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3232560.0, ans=0.1 2023-11-26 05:06:01,424 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 484900 2023-11-26 05:06:01,551 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3232626.6666666665, ans=0.0 2023-11-26 05:06:02,562 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3232626.6666666665, ans=0.125 2023-11-26 05:06:07,639 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 3950, loss[loss=0.05898, simple_loss=0.08105, pruned_loss=0.009964, audio_tagging_loss=0.00849, over 14200.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.09012, pruned_loss=0.01252, audio_tagging_loss=0.008976, over 3039360.70 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:06:09,793 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.67 vs. limit=10.0 2023-11-26 05:06:14,513 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3232693.3333333335, ans=0.125 2023-11-26 05:06:21,639 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3232760.0, ans=10.0 2023-11-26 05:06:45,963 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.408e+01 8.918e+01 9.453e+01 1.012e+02 1.260e+02, threshold=1.891e+02, percent-clipped=0.0 2023-11-26 05:06:57,173 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 484950 2023-11-26 05:07:04,059 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 4000, loss[loss=0.07533, simple_loss=0.1015, pruned_loss=0.01458, audio_tagging_loss=0.01, over 15900.00 frames. ], tot_loss[loss=0.0669, simple_loss=0.09029, pruned_loss=0.01264, audio_tagging_loss=0.009114, over 3037456.87 frames. ], batch size: 60, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:07:20,311 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3233093.3333333335, ans=0.0 2023-11-26 05:07:23,563 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3233093.3333333335, ans=0.125 2023-11-26 05:07:27,743 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3233160.0, ans=0.125 2023-11-26 05:07:38,459 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.70 vs. limit=15.0 2023-11-26 05:07:54,536 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 485000 2023-11-26 05:08:01,193 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 4050, loss[loss=0.07627, simple_loss=0.09999, pruned_loss=0.01743, audio_tagging_loss=0.008848, over 15402.00 frames. ], tot_loss[loss=0.06711, simple_loss=0.09046, pruned_loss=0.01269, audio_tagging_loss=0.009187, over 3036924.71 frames. ], batch size: 62, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:08:03,411 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 05:08:03,560 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3233360.0, ans=0.1 2023-11-26 05:08:06,843 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3233360.0, ans=0.125 2023-11-26 05:08:15,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3233426.6666666665, ans=0.04949747468305833 2023-11-26 05:08:29,190 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3233493.3333333335, ans=0.125 2023-11-26 05:08:38,911 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.299e+01 8.932e+01 9.380e+01 1.022e+02 1.358e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-26 05:08:43,391 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3233560.0, ans=0.0 2023-11-26 05:08:48,884 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3233626.6666666665, ans=0.1 2023-11-26 05:08:49,759 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 485050 2023-11-26 05:08:56,102 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 4100, loss[loss=0.06495, simple_loss=0.08893, pruned_loss=0.01176, audio_tagging_loss=0.008719, over 16278.00 frames. ], tot_loss[loss=0.0667, simple_loss=0.08994, pruned_loss=0.01251, audio_tagging_loss=0.009218, over 3041599.51 frames. ], batch size: 61, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:08:57,965 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.88 vs. limit=6.0 2023-11-26 05:09:31,279 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3233893.3333333335, ans=0.1 2023-11-26 05:09:45,638 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 485100 2023-11-26 05:09:51,909 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 4150, loss[loss=0.07234, simple_loss=0.1063, pruned_loss=0.01254, audio_tagging_loss=0.006636, over 15174.00 frames. ], tot_loss[loss=0.06675, simple_loss=0.09039, pruned_loss=0.0125, audio_tagging_loss=0.009066, over 3038943.50 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:09:52,121 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3234026.6666666665, ans=0.1 2023-11-26 05:10:25,019 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3234226.6666666665, ans=0.125 2023-11-26 05:10:30,097 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.245e+01 8.624e+01 9.472e+01 1.019e+02 1.478e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-26 05:10:32,252 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 05:10:41,379 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 485150 2023-11-26 05:10:46,221 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3234293.3333333335, ans=0.0 2023-11-26 05:10:48,105 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 4200, loss[loss=0.08178, simple_loss=0.1129, pruned_loss=0.01914, audio_tagging_loss=0.006181, over 14927.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.09003, pruned_loss=0.01249, audio_tagging_loss=0.008866, over 3032852.91 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:10:59,155 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.28 vs. limit=22.5 2023-11-26 05:11:01,033 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3234426.6666666665, ans=0.125 2023-11-26 05:11:01,071 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3234426.6666666665, ans=0.125 2023-11-26 05:11:03,393 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=7.84 vs. limit=12.0 2023-11-26 05:11:15,558 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3234493.3333333335, ans=0.125 2023-11-26 05:11:23,528 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3234560.0, ans=0.1 2023-11-26 05:11:24,608 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3234560.0, ans=0.04949747468305833 2023-11-26 05:11:37,186 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 485200 2023-11-26 05:11:43,692 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 4250, loss[loss=0.07662, simple_loss=0.1096, pruned_loss=0.01482, audio_tagging_loss=0.006993, over 15375.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.08978, pruned_loss=0.01236, audio_tagging_loss=0.008861, over 3041137.63 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:11:53,985 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3234760.0, ans=0.125 2023-11-26 05:12:07,273 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3234826.6666666665, ans=10.0 2023-11-26 05:12:14,849 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.98 vs. limit=15.0 2023-11-26 05:12:21,681 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.997e+01 8.763e+01 9.377e+01 1.004e+02 4.197e+02, threshold=1.875e+02, percent-clipped=1.0 2023-11-26 05:12:33,018 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 485250 2023-11-26 05:12:33,256 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff2.min_abs, batch_count=3234960.0, ans=0.1 2023-11-26 05:12:39,365 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 4300, loss[loss=0.07033, simple_loss=0.1005, pruned_loss=0.0123, audio_tagging_loss=0.007795, over 15886.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.08998, pruned_loss=0.01244, audio_tagging_loss=0.008711, over 3039895.05 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:12:41,265 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.81 vs. limit=6.0 2023-11-26 05:12:49,797 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 05:13:06,871 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3235160.0, ans=0.1 2023-11-26 05:13:22,773 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=3235226.6666666665, ans=15.0 2023-11-26 05:13:27,763 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.10 vs. limit=15.0 2023-11-26 05:13:28,995 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 485300 2023-11-26 05:13:35,801 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 4350, loss[loss=0.05907, simple_loss=0.08254, pruned_loss=0.009173, audio_tagging_loss=0.008626, over 15289.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.09036, pruned_loss=0.01237, audio_tagging_loss=0.00862, over 3037728.97 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:13:54,955 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.38 vs. limit=22.5 2023-11-26 05:14:09,001 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3235560.0, ans=0.125 2023-11-26 05:14:12,092 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.83 vs. limit=15.0 2023-11-26 05:14:13,758 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3235560.0, ans=10.0 2023-11-26 05:14:14,583 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.430e+01 8.639e+01 9.414e+01 1.000e+02 1.262e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-26 05:14:14,752 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3235560.0, ans=0.0 2023-11-26 05:14:25,131 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 485350 2023-11-26 05:14:26,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3235626.6666666665, ans=0.0 2023-11-26 05:14:26,364 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3235626.6666666665, ans=0.125 2023-11-26 05:14:30,640 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3235693.3333333335, ans=0.125 2023-11-26 05:14:31,452 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 4400, loss[loss=0.06059, simple_loss=0.07363, pruned_loss=0.0117, audio_tagging_loss=0.01207, over 14286.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.0903, pruned_loss=0.01237, audio_tagging_loss=0.008569, over 3043032.84 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:15:08,792 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.20 vs. limit=15.0 2023-11-26 05:15:19,906 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 485400 2023-11-26 05:15:27,071 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 4450, loss[loss=0.06359, simple_loss=0.08151, pruned_loss=0.01385, audio_tagging_loss=0.008985, over 15050.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.09048, pruned_loss=0.01239, audio_tagging_loss=0.008598, over 3050657.22 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:15:33,741 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3236026.6666666665, ans=0.125 2023-11-26 05:15:42,264 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3236093.3333333335, ans=0.1 2023-11-26 05:16:06,362 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.168e+01 8.912e+01 9.547e+01 1.021e+02 1.319e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-26 05:16:13,092 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3236293.3333333335, ans=0.1 2023-11-26 05:16:16,078 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 485450 2023-11-26 05:16:23,544 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 4500, loss[loss=0.06422, simple_loss=0.07532, pruned_loss=0.01452, audio_tagging_loss=0.01204, over 15131.00 frames. ], tot_loss[loss=0.06691, simple_loss=0.09134, pruned_loss=0.01272, audio_tagging_loss=0.008522, over 3055959.48 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:16:30,142 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3236360.0, ans=0.125 2023-11-26 05:16:33,988 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3236426.6666666665, ans=0.0 2023-11-26 05:16:38,407 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.17 vs. limit=22.5 2023-11-26 05:16:50,438 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3236493.3333333335, ans=0.125 2023-11-26 05:16:51,554 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3236493.3333333335, ans=0.07 2023-11-26 05:16:53,674 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3236493.3333333335, ans=0.1 2023-11-26 05:17:04,800 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3236560.0, ans=0.125 2023-11-26 05:17:10,654 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3236626.6666666665, ans=0.125 2023-11-26 05:17:12,568 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 485500 2023-11-26 05:17:13,762 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3236626.6666666665, ans=0.0 2023-11-26 05:17:19,341 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 4550, loss[loss=0.05606, simple_loss=0.07016, pruned_loss=0.01079, audio_tagging_loss=0.01019, over 16546.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.08984, pruned_loss=0.01251, audio_tagging_loss=0.008674, over 3057231.72 frames. ], batch size: 64, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:17:25,974 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3236693.3333333335, ans=0.0 2023-11-26 05:17:34,840 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3236760.0, ans=0.125 2023-11-26 05:17:37,552 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.42 vs. limit=15.0 2023-11-26 05:17:40,596 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.79 vs. limit=15.0 2023-11-26 05:17:57,819 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.811e+01 8.528e+01 9.112e+01 9.671e+01 1.236e+02, threshold=1.822e+02, percent-clipped=0.0 2023-11-26 05:17:58,086 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3236893.3333333335, ans=0.125 2023-11-26 05:18:00,038 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 05:18:08,210 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 485550 2023-11-26 05:18:15,112 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 4600, loss[loss=0.06144, simple_loss=0.09138, pruned_loss=0.006974, audio_tagging_loss=0.008779, over 16198.00 frames. ], tot_loss[loss=0.06649, simple_loss=0.09019, pruned_loss=0.0127, audio_tagging_loss=0.008702, over 3051300.48 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:19:03,805 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 485600 2023-11-26 05:19:10,778 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 4650, loss[loss=0.04487, simple_loss=0.05612, pruned_loss=0.006823, audio_tagging_loss=0.00999, over 15388.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.08972, pruned_loss=0.01268, audio_tagging_loss=0.008874, over 3050177.14 frames. ], batch size: 60, lr: 1.65e-03, grad_scale: 8.0 2023-11-26 05:19:15,831 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3237360.0, ans=0.125 2023-11-26 05:19:17,920 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3237360.0, ans=0.2 2023-11-26 05:19:42,099 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3237493.3333333335, ans=0.125 2023-11-26 05:19:44,343 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3237560.0, ans=0.025 2023-11-26 05:19:51,901 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.215e+01 8.706e+01 9.399e+01 1.022e+02 1.601e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-26 05:19:57,025 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3237626.6666666665, ans=0.0 2023-11-26 05:19:59,972 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 485650 2023-11-26 05:20:06,221 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 4700, loss[loss=0.05496, simple_loss=0.0752, pruned_loss=0.007782, audio_tagging_loss=0.009577, over 14085.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.08956, pruned_loss=0.01258, audio_tagging_loss=0.008983, over 3040347.15 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 8.0 2023-11-26 05:20:20,430 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.90 vs. limit=22.5 2023-11-26 05:20:37,893 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3237826.6666666665, ans=0.125 2023-11-26 05:20:40,094 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3237893.3333333335, ans=0.125 2023-11-26 05:20:42,209 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3237893.3333333335, ans=0.0 2023-11-26 05:20:54,785 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 485700 2023-11-26 05:21:02,120 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 4750, loss[loss=0.05915, simple_loss=0.08087, pruned_loss=0.008273, audio_tagging_loss=0.01044, over 15181.00 frames. ], tot_loss[loss=0.0664, simple_loss=0.08971, pruned_loss=0.01257, audio_tagging_loss=0.00898, over 3044830.75 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 8.0 2023-11-26 05:21:37,291 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3238226.6666666665, ans=0.0 2023-11-26 05:21:42,924 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.508e+01 8.672e+01 9.207e+01 9.886e+01 1.229e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-26 05:21:50,948 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 485750 2023-11-26 05:21:57,724 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 4800, loss[loss=0.07556, simple_loss=0.1014, pruned_loss=0.01688, audio_tagging_loss=0.007973, over 14907.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.08883, pruned_loss=0.01231, audio_tagging_loss=0.009149, over 3049454.47 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:22:13,428 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3238426.6666666665, ans=0.0 2023-11-26 05:22:14,341 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3238426.6666666665, ans=0.2 2023-11-26 05:22:23,283 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3238493.3333333335, ans=0.125 2023-11-26 05:22:23,387 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 05:22:32,636 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.72 vs. limit=12.0 2023-11-26 05:22:46,897 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 485800 2023-11-26 05:22:52,680 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3238693.3333333335, ans=0.0 2023-11-26 05:22:53,461 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 4850, loss[loss=0.07856, simple_loss=0.1049, pruned_loss=0.01574, audio_tagging_loss=0.0104, over 14510.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.08934, pruned_loss=0.01218, audio_tagging_loss=0.009245, over 3054954.74 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 8.0 2023-11-26 05:23:34,773 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3238893.3333333335, ans=0.1 2023-11-26 05:23:35,553 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.989e+01 8.610e+01 9.289e+01 1.009e+02 1.598e+02, threshold=1.858e+02, percent-clipped=0.0 2023-11-26 05:23:38,226 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.14 vs. limit=15.0 2023-11-26 05:23:41,970 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 485850 2023-11-26 05:23:48,250 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 4900, loss[loss=0.0516, simple_loss=0.07107, pruned_loss=0.006065, audio_tagging_loss=0.01, over 16517.00 frames. ], tot_loss[loss=0.06682, simple_loss=0.09058, pruned_loss=0.01237, audio_tagging_loss=0.009163, over 3057920.58 frames. ], batch size: 64, lr: 1.65e-03, grad_scale: 8.0 2023-11-26 05:23:48,716 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.20 vs. limit=15.0 2023-11-26 05:23:52,162 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3239026.6666666665, ans=0.125 2023-11-26 05:24:29,938 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3239226.6666666665, ans=0.125 2023-11-26 05:24:31,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3239293.3333333335, ans=0.125 2023-11-26 05:24:35,396 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3239293.3333333335, ans=0.125 2023-11-26 05:24:37,335 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 485900 2023-11-26 05:24:37,708 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.57 vs. limit=12.0 2023-11-26 05:24:42,793 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3239360.0, ans=0.07 2023-11-26 05:24:43,568 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 4950, loss[loss=0.06305, simple_loss=0.08896, pruned_loss=0.01068, audio_tagging_loss=0.007886, over 15111.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.09025, pruned_loss=0.01237, audio_tagging_loss=0.00902, over 3056272.53 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 8.0 2023-11-26 05:24:52,039 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3239360.0, ans=0.015 2023-11-26 05:24:54,389 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3239426.6666666665, ans=0.125 2023-11-26 05:24:59,548 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3239426.6666666665, ans=0.0 2023-11-26 05:25:04,181 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.35 vs. limit=15.0 2023-11-26 05:25:21,790 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3239560.0, ans=0.125 2023-11-26 05:25:24,134 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.38 vs. limit=15.0 2023-11-26 05:25:25,805 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.598e+01 8.689e+01 9.233e+01 9.794e+01 1.211e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-26 05:25:33,268 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 485950 2023-11-26 05:25:36,028 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.59 vs. limit=15.0 2023-11-26 05:25:39,614 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 5000, loss[loss=0.04819, simple_loss=0.06665, pruned_loss=0.007254, audio_tagging_loss=0.007614, over 15029.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.0902, pruned_loss=0.01232, audio_tagging_loss=0.008907, over 3056298.68 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 8.0 2023-11-26 05:25:53,520 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3239760.0, ans=0.0 2023-11-26 05:25:59,279 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3239760.0, ans=0.5 2023-11-26 05:26:10,912 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3239826.6666666665, ans=0.125 2023-11-26 05:26:13,574 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.79 vs. limit=15.0 2023-11-26 05:26:13,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3239893.3333333335, ans=0.0 2023-11-26 05:26:16,131 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=3239893.3333333335, ans=15.0 2023-11-26 05:26:18,759 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3239893.3333333335, ans=0.125 2023-11-26 05:26:18,818 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3239893.3333333335, ans=0.125 2023-11-26 05:26:18,820 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3239893.3333333335, ans=0.0 2023-11-26 05:26:19,067 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.28 vs. limit=15.0 2023-11-26 05:26:28,171 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 486000 2023-11-26 05:26:29,638 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3239960.0, ans=0.1 2023-11-26 05:26:34,636 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 5050, loss[loss=0.03916, simple_loss=0.05061, pruned_loss=0.005017, audio_tagging_loss=0.008835, over 14628.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.08992, pruned_loss=0.0124, audio_tagging_loss=0.008758, over 3049863.31 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 8.0 2023-11-26 05:26:43,729 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.88 vs. limit=15.0 2023-11-26 05:27:02,308 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3240160.0, ans=0.125 2023-11-26 05:27:16,763 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.060e+01 8.723e+01 9.210e+01 1.029e+02 1.181e+02, threshold=1.842e+02, percent-clipped=0.0 2023-11-26 05:27:21,746 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3240293.3333333335, ans=0.0 2023-11-26 05:27:23,666 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 486050 2023-11-26 05:27:30,386 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 5100, loss[loss=0.05776, simple_loss=0.06531, pruned_loss=0.01073, audio_tagging_loss=0.01437, over 16221.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08936, pruned_loss=0.01241, audio_tagging_loss=0.008767, over 3046968.35 frames. ], batch size: 61, lr: 1.65e-03, grad_scale: 8.0 2023-11-26 05:27:37,592 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.79 vs. limit=15.0 2023-11-26 05:27:47,356 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3240426.6666666665, ans=0.2 2023-11-26 05:27:49,008 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.99 vs. limit=6.0 2023-11-26 05:28:07,953 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3240560.0, ans=0.09899494936611666 2023-11-26 05:28:19,383 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 486100 2023-11-26 05:28:26,092 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 5150, loss[loss=0.06709, simple_loss=0.09267, pruned_loss=0.01579, audio_tagging_loss=0.004962, over 14480.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.09006, pruned_loss=0.01245, audio_tagging_loss=0.008655, over 3047579.85 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 8.0 2023-11-26 05:28:31,769 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3240693.3333333335, ans=0.125 2023-11-26 05:28:55,953 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3240826.6666666665, ans=0.1 2023-11-26 05:29:03,364 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.89 vs. limit=6.0 2023-11-26 05:29:08,269 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.744e+01 8.813e+01 9.450e+01 1.017e+02 1.282e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-26 05:29:09,498 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3240960.0, ans=0.0 2023-11-26 05:29:14,697 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 486150 2023-11-26 05:29:16,985 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3240960.0, ans=0.125 2023-11-26 05:29:21,069 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 5200, loss[loss=0.06429, simple_loss=0.08426, pruned_loss=0.01011, audio_tagging_loss=0.01205, over 14402.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.09001, pruned_loss=0.0123, audio_tagging_loss=0.008679, over 3045648.92 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:29:43,396 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.67 vs. limit=15.0 2023-11-26 05:29:44,090 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3241160.0, ans=0.2 2023-11-26 05:30:10,283 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 486200 2023-11-26 05:30:16,780 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 5250, loss[loss=0.05658, simple_loss=0.07853, pruned_loss=0.008408, audio_tagging_loss=0.008909, over 15646.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.08949, pruned_loss=0.01212, audio_tagging_loss=0.008707, over 3047929.02 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:30:16,957 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3241360.0, ans=0.125 2023-11-26 05:30:44,030 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.32 vs. limit=22.5 2023-11-26 05:30:45,266 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.53 vs. limit=22.5 2023-11-26 05:30:53,829 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3241560.0, ans=0.125 2023-11-26 05:30:58,897 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.332e+01 8.725e+01 9.409e+01 1.008e+02 1.630e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-26 05:31:02,390 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3241626.6666666665, ans=0.95 2023-11-26 05:31:05,907 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 486250 2023-11-26 05:31:13,312 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 5300, loss[loss=0.08422, simple_loss=0.1131, pruned_loss=0.01659, audio_tagging_loss=0.01109, over 14279.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.09012, pruned_loss=0.01224, audio_tagging_loss=0.008762, over 3042173.74 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:31:17,684 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3241693.3333333335, ans=0.0 2023-11-26 05:31:19,723 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3241693.3333333335, ans=0.125 2023-11-26 05:31:22,226 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=3241693.3333333335, ans=15.0 2023-11-26 05:31:49,655 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3241893.3333333335, ans=0.1 2023-11-26 05:31:56,837 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3241960.0, ans=0.1 2023-11-26 05:31:57,988 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3241960.0, ans=0.125 2023-11-26 05:32:01,945 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 486300 2023-11-26 05:32:08,104 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 5350, loss[loss=0.0615, simple_loss=0.08302, pruned_loss=0.01265, audio_tagging_loss=0.007342, over 16022.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.0907, pruned_loss=0.01233, audio_tagging_loss=0.008744, over 3039174.78 frames. ], batch size: 60, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:32:10,407 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3242026.6666666665, ans=0.2 2023-11-26 05:32:12,592 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3242026.6666666665, ans=0.125 2023-11-26 05:32:49,969 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.283e+01 8.485e+01 9.147e+01 9.991e+01 1.214e+02, threshold=1.829e+02, percent-clipped=0.0 2023-11-26 05:32:56,332 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 486350 2023-11-26 05:33:03,202 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 5400, loss[loss=0.06885, simple_loss=0.09253, pruned_loss=0.01402, audio_tagging_loss=0.008573, over 15343.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.09075, pruned_loss=0.01238, audio_tagging_loss=0.008835, over 3039978.30 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:33:03,395 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3242360.0, ans=0.0 2023-11-26 05:33:04,386 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3242360.0, ans=0.0 2023-11-26 05:33:04,459 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3242360.0, ans=0.2 2023-11-26 05:33:08,078 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.72 vs. limit=22.5 2023-11-26 05:33:11,401 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.55 vs. limit=8.0 2023-11-26 05:33:24,426 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.06 vs. limit=12.0 2023-11-26 05:33:27,209 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3242493.3333333335, ans=0.1 2023-11-26 05:33:30,343 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3242493.3333333335, ans=0.1 2023-11-26 05:33:33,548 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3242493.3333333335, ans=0.07 2023-11-26 05:33:37,034 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.90 vs. limit=15.0 2023-11-26 05:33:51,805 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 486400 2023-11-26 05:33:59,179 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 5450, loss[loss=0.06892, simple_loss=0.08846, pruned_loss=0.01435, audio_tagging_loss=0.01034, over 15021.00 frames. ], tot_loss[loss=0.06715, simple_loss=0.09159, pruned_loss=0.01257, audio_tagging_loss=0.008789, over 3045534.45 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:33:59,392 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3242693.3333333335, ans=0.0 2023-11-26 05:34:22,010 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3242826.6666666665, ans=0.125 2023-11-26 05:34:23,009 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3242826.6666666665, ans=0.2 2023-11-26 05:34:25,541 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3242826.6666666665, ans=0.05 2023-11-26 05:34:37,633 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3242893.3333333335, ans=0.125 2023-11-26 05:34:41,182 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.720e+01 8.605e+01 9.179e+01 9.906e+01 1.952e+02, threshold=1.836e+02, percent-clipped=1.0 2023-11-26 05:34:44,566 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3242960.0, ans=0.07 2023-11-26 05:34:48,116 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 486450 2023-11-26 05:34:49,363 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3242960.0, ans=0.125 2023-11-26 05:34:54,493 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 5500, loss[loss=0.07606, simple_loss=0.1106, pruned_loss=0.01442, audio_tagging_loss=0.006333, over 16600.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.09104, pruned_loss=0.01239, audio_tagging_loss=0.008778, over 3046165.86 frames. ], batch size: 62, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:35:18,926 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3243160.0, ans=0.0 2023-11-26 05:35:34,662 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3243226.6666666665, ans=0.125 2023-11-26 05:35:34,724 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3243226.6666666665, ans=0.125 2023-11-26 05:35:42,915 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 486500 2023-11-26 05:35:46,293 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3243293.3333333335, ans=0.2 2023-11-26 05:35:49,792 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 5550, loss[loss=0.07424, simple_loss=0.1099, pruned_loss=0.01338, audio_tagging_loss=0.005912, over 15090.00 frames. ], tot_loss[loss=0.06738, simple_loss=0.09197, pruned_loss=0.01255, audio_tagging_loss=0.008843, over 3050262.26 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:35:55,426 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3243360.0, ans=0.125 2023-11-26 05:36:04,421 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3243426.6666666665, ans=0.125 2023-11-26 05:36:04,468 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3243426.6666666665, ans=0.1 2023-11-26 05:36:10,394 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3243426.6666666665, ans=0.0 2023-11-26 05:36:11,459 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3243493.3333333335, ans=0.0 2023-11-26 05:36:32,123 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.529e+01 8.745e+01 9.267e+01 1.002e+02 1.641e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-26 05:36:35,564 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3243626.6666666665, ans=0.125 2023-11-26 05:36:36,622 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 05:36:38,525 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 486550 2023-11-26 05:36:41,853 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 05:36:45,274 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 5600, loss[loss=0.06924, simple_loss=0.09682, pruned_loss=0.01241, audio_tagging_loss=0.008422, over 15372.00 frames. ], tot_loss[loss=0.06696, simple_loss=0.09126, pruned_loss=0.01236, audio_tagging_loss=0.008973, over 3047107.07 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:36:50,546 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.40 vs. limit=10.0 2023-11-26 05:36:51,508 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.82 vs. limit=22.5 2023-11-26 05:37:01,181 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 05:37:21,959 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.95 vs. limit=15.0 2023-11-26 05:37:23,717 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 05:37:28,579 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3243960.0, ans=0.0 2023-11-26 05:37:34,269 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 486600 2023-11-26 05:37:34,694 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.42 vs. limit=22.5 2023-11-26 05:37:40,751 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 5650, loss[loss=0.05034, simple_loss=0.06392, pruned_loss=0.008275, audio_tagging_loss=0.0101, over 13813.00 frames. ], tot_loss[loss=0.06773, simple_loss=0.0925, pruned_loss=0.01253, audio_tagging_loss=0.008943, over 3051828.62 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:37:41,981 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3244026.6666666665, ans=0.07 2023-11-26 05:38:09,765 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3244160.0, ans=0.0 2023-11-26 05:38:19,045 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.87 vs. limit=15.0 2023-11-26 05:38:23,772 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.303e+01 8.720e+01 9.280e+01 9.877e+01 1.364e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-26 05:38:29,539 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 486650 2023-11-26 05:38:35,767 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 5700, loss[loss=0.06117, simple_loss=0.08512, pruned_loss=0.01092, audio_tagging_loss=0.007691, over 14475.00 frames. ], tot_loss[loss=0.0676, simple_loss=0.09215, pruned_loss=0.01261, audio_tagging_loss=0.008914, over 3052934.66 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:38:47,961 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3244426.6666666665, ans=0.1 2023-11-26 05:38:59,556 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3244493.3333333335, ans=0.1 2023-11-26 05:39:00,659 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3244493.3333333335, ans=0.0 2023-11-26 05:39:09,037 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3244560.0, ans=0.5 2023-11-26 05:39:18,201 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.35 vs. limit=6.0 2023-11-26 05:39:24,736 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 486700 2023-11-26 05:39:31,502 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 5750, loss[loss=0.07756, simple_loss=0.1059, pruned_loss=0.01814, audio_tagging_loss=0.006489, over 15239.00 frames. ], tot_loss[loss=0.06701, simple_loss=0.09111, pruned_loss=0.01256, audio_tagging_loss=0.008892, over 3051001.43 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:39:34,824 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 05:39:38,541 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3244693.3333333335, ans=0.125 2023-11-26 05:39:56,299 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.06 vs. limit=15.0 2023-11-26 05:40:15,738 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.860e+01 8.613e+01 9.170e+01 1.044e+02 1.478e+02, threshold=1.834e+02, percent-clipped=0.0 2023-11-26 05:40:20,591 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 486750 2023-11-26 05:40:26,822 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 5800, loss[loss=0.06045, simple_loss=0.08201, pruned_loss=0.0112, audio_tagging_loss=0.008238, over 13623.00 frames. ], tot_loss[loss=0.06767, simple_loss=0.09212, pruned_loss=0.01288, audio_tagging_loss=0.008725, over 3054420.22 frames. ], batch size: 50, lr: 1.65e-03, grad_scale: 8.0 2023-11-26 05:40:42,450 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3245093.3333333335, ans=0.1 2023-11-26 05:41:11,430 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3245293.3333333335, ans=0.0 2023-11-26 05:41:15,456 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 486800 2023-11-26 05:41:21,984 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 5850, loss[loss=0.06643, simple_loss=0.08851, pruned_loss=0.01336, audio_tagging_loss=0.008812, over 14349.00 frames. ], tot_loss[loss=0.06676, simple_loss=0.09092, pruned_loss=0.01262, audio_tagging_loss=0.008685, over 3048771.47 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 8.0 2023-11-26 05:41:27,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3245360.0, ans=0.125 2023-11-26 05:41:33,905 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3245426.6666666665, ans=0.0 2023-11-26 05:41:40,712 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3245426.6666666665, ans=0.2 2023-11-26 05:41:43,181 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.50 vs. limit=15.0 2023-11-26 05:41:48,064 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.40 vs. limit=15.0 2023-11-26 05:41:54,912 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3245560.0, ans=0.09899494936611666 2023-11-26 05:42:06,260 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.352e+01 8.540e+01 9.221e+01 1.014e+02 1.317e+02, threshold=1.844e+02, percent-clipped=0.0 2023-11-26 05:42:11,724 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 486850 2023-11-26 05:42:17,892 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 5900, loss[loss=0.08756, simple_loss=0.1157, pruned_loss=0.02156, audio_tagging_loss=0.00814, over 15563.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.0905, pruned_loss=0.01247, audio_tagging_loss=0.008644, over 3045190.27 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 8.0 2023-11-26 05:42:32,801 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3245760.0, ans=0.125 2023-11-26 05:42:40,109 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3245826.6666666665, ans=0.0 2023-11-26 05:43:06,760 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 486900 2023-11-26 05:43:13,575 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 5950, loss[loss=0.07501, simple_loss=0.1059, pruned_loss=0.01539, audio_tagging_loss=0.006682, over 15823.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.09054, pruned_loss=0.01239, audio_tagging_loss=0.008695, over 3046527.40 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 8.0 2023-11-26 05:43:18,118 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3246026.6666666665, ans=10.0 2023-11-26 05:43:27,470 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3246093.3333333335, ans=0.125 2023-11-26 05:43:42,244 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3246160.0, ans=0.125 2023-11-26 05:43:57,759 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.996e+01 8.522e+01 9.337e+01 1.020e+02 1.344e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-26 05:44:02,128 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 486950 2023-11-26 05:44:08,294 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 6000, loss[loss=0.06597, simple_loss=0.08862, pruned_loss=0.0112, audio_tagging_loss=0.01046, over 15746.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.09036, pruned_loss=0.01242, audio_tagging_loss=0.008592, over 3046929.71 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:44:08,297 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-26 05:44:40,563 INFO [train_asr.py:1267] (0/4) Epoch 41, validation: loss=0.05752, simple_loss=0.0506, pruned_loss=0.005164, audio_tagging_loss=0.02705, over 4681554.00 frames. 2023-11-26 05:44:40,564 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-26 05:44:42,887 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3246360.0, ans=0.125 2023-11-26 05:44:56,105 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3246426.6666666665, ans=0.125 2023-11-26 05:45:07,037 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.07 vs. limit=22.5 2023-11-26 05:45:13,696 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3246560.0, ans=0.125 2023-11-26 05:45:14,837 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3246560.0, ans=0.0 2023-11-26 05:45:18,390 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3246560.0, ans=0.015 2023-11-26 05:45:19,634 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3246560.0, ans=0.125 2023-11-26 05:45:20,432 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 05:45:23,198 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3246560.0, ans=0.1 2023-11-26 05:45:28,158 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.98 vs. limit=12.0 2023-11-26 05:45:29,947 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 487000 2023-11-26 05:45:34,555 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3246626.6666666665, ans=0.0 2023-11-26 05:45:36,450 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 6050, loss[loss=0.05253, simple_loss=0.06443, pruned_loss=0.007322, audio_tagging_loss=0.01299, over 13862.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.09035, pruned_loss=0.01251, audio_tagging_loss=0.008685, over 3041828.78 frames. ], batch size: 52, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:45:39,772 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3246693.3333333335, ans=0.2 2023-11-26 05:45:48,410 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3246760.0, ans=0.125 2023-11-26 05:45:50,545 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3246760.0, ans=0.05 2023-11-26 05:45:53,925 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3246760.0, ans=0.125 2023-11-26 05:45:55,621 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.51 vs. limit=15.0 2023-11-26 05:45:56,444 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3246760.0, ans=0.2 2023-11-26 05:45:57,714 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3246826.6666666665, ans=0.0 2023-11-26 05:46:00,825 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3246826.6666666665, ans=0.0 2023-11-26 05:46:21,413 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.073e+01 8.605e+01 9.174e+01 9.669e+01 1.333e+02, threshold=1.835e+02, percent-clipped=0.0 2023-11-26 05:46:21,605 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3246960.0, ans=0.125 2023-11-26 05:46:25,735 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 487050 2023-11-26 05:46:25,848 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3246960.0, ans=0.125 2023-11-26 05:46:25,935 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3246960.0, ans=0.0 2023-11-26 05:46:32,117 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 6100, loss[loss=0.04786, simple_loss=0.06407, pruned_loss=0.005669, audio_tagging_loss=0.01016, over 15396.00 frames. ], tot_loss[loss=0.06646, simple_loss=0.09056, pruned_loss=0.01243, audio_tagging_loss=0.008747, over 3042371.63 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:46:50,279 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.61 vs. limit=6.0 2023-11-26 05:46:53,746 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3247093.3333333335, ans=0.025 2023-11-26 05:46:56,924 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3247160.0, ans=0.125 2023-11-26 05:47:11,877 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3247226.6666666665, ans=0.125 2023-11-26 05:47:21,115 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.21 vs. limit=15.0 2023-11-26 05:47:21,769 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 487100 2023-11-26 05:47:28,039 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 6150, loss[loss=0.06051, simple_loss=0.08514, pruned_loss=0.01074, audio_tagging_loss=0.007204, over 15491.00 frames. ], tot_loss[loss=0.06678, simple_loss=0.09113, pruned_loss=0.01246, audio_tagging_loss=0.008755, over 3043045.25 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:47:48,549 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3247426.6666666665, ans=0.125 2023-11-26 05:47:51,797 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3247493.3333333335, ans=0.0 2023-11-26 05:47:58,229 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3247493.3333333335, ans=0.0 2023-11-26 05:48:11,228 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3247626.6666666665, ans=0.1 2023-11-26 05:48:12,589 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.193e+01 8.728e+01 9.335e+01 1.012e+02 1.245e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-26 05:48:15,910 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.33 vs. limit=15.0 2023-11-26 05:48:17,118 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3247626.6666666665, ans=0.125 2023-11-26 05:48:17,899 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 487150 2023-11-26 05:48:24,209 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 6200, loss[loss=0.08086, simple_loss=0.1047, pruned_loss=0.01991, audio_tagging_loss=0.008596, over 15381.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.09071, pruned_loss=0.01248, audio_tagging_loss=0.00881, over 3041883.48 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:48:25,441 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3247693.3333333335, ans=0.0 2023-11-26 05:48:32,909 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3247693.3333333335, ans=0.2 2023-11-26 05:49:07,097 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 05:49:08,030 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3247960.0, ans=0.0 2023-11-26 05:49:13,254 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 487200 2023-11-26 05:49:15,706 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3247960.0, ans=0.2 2023-11-26 05:49:18,904 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3248026.6666666665, ans=0.125 2023-11-26 05:49:19,825 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 6250, loss[loss=0.07094, simple_loss=0.1023, pruned_loss=0.01268, audio_tagging_loss=0.007132, over 14249.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.08977, pruned_loss=0.01251, audio_tagging_loss=0.00881, over 3035774.00 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:49:26,787 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.42 vs. limit=15.0 2023-11-26 05:49:47,689 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.76 vs. limit=15.0 2023-11-26 05:49:50,527 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3248160.0, ans=0.125 2023-11-26 05:49:59,118 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3248226.6666666665, ans=0.0 2023-11-26 05:50:04,065 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.648e+01 8.628e+01 9.158e+01 1.005e+02 1.454e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-26 05:50:08,396 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 487250 2023-11-26 05:50:15,170 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 6300, loss[loss=0.05408, simple_loss=0.0682, pruned_loss=0.008377, audio_tagging_loss=0.0116, over 15404.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.09007, pruned_loss=0.01249, audio_tagging_loss=0.008887, over 3030857.20 frames. ], batch size: 63, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:50:20,145 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3248360.0, ans=0.125 2023-11-26 05:50:20,470 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.78 vs. limit=15.0 2023-11-26 05:51:04,427 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 487300 2023-11-26 05:51:09,878 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3248626.6666666665, ans=0.125 2023-11-26 05:51:11,378 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.82 vs. limit=15.0 2023-11-26 05:51:11,864 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 6350, loss[loss=0.0516, simple_loss=0.07221, pruned_loss=0.00563, audio_tagging_loss=0.009868, over 16337.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.09031, pruned_loss=0.01242, audio_tagging_loss=0.008927, over 3037287.54 frames. ], batch size: 60, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:51:32,250 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3248826.6666666665, ans=0.125 2023-11-26 05:51:36,987 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3248826.6666666665, ans=0.0 2023-11-26 05:51:39,460 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.49 vs. limit=15.0 2023-11-26 05:51:44,042 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3248893.3333333335, ans=0.125 2023-11-26 05:51:44,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3248893.3333333335, ans=0.1 2023-11-26 05:51:48,689 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3248893.3333333335, ans=0.125 2023-11-26 05:51:56,308 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.938e+01 8.555e+01 9.166e+01 9.747e+01 1.455e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-26 05:51:56,477 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3248960.0, ans=0.035 2023-11-26 05:52:00,767 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 487350 2023-11-26 05:52:06,950 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 6400, loss[loss=0.07782, simple_loss=0.09722, pruned_loss=0.02068, audio_tagging_loss=0.008534, over 15420.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.08969, pruned_loss=0.01227, audio_tagging_loss=0.009037, over 3031058.75 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:52:10,442 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3249026.6666666665, ans=0.0 2023-11-26 05:52:21,722 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3249093.3333333335, ans=0.125 2023-11-26 05:52:55,835 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 487400 2023-11-26 05:53:02,864 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 6450, loss[loss=0.05902, simple_loss=0.07778, pruned_loss=0.01053, audio_tagging_loss=0.009598, over 15134.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.09026, pruned_loss=0.01226, audio_tagging_loss=0.009027, over 3045423.21 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:53:11,449 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.64 vs. limit=15.0 2023-11-26 05:53:13,870 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.04 vs. limit=22.5 2023-11-26 05:53:17,669 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3249426.6666666665, ans=0.0 2023-11-26 05:53:26,070 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.83 vs. limit=22.5 2023-11-26 05:53:47,383 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.968e+01 8.690e+01 9.179e+01 1.001e+02 1.387e+02, threshold=1.836e+02, percent-clipped=0.0 2023-11-26 05:53:52,278 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 487450 2023-11-26 05:53:59,113 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 6500, loss[loss=0.06405, simple_loss=0.08887, pruned_loss=0.009674, audio_tagging_loss=0.009945, over 14583.00 frames. ], tot_loss[loss=0.06662, simple_loss=0.09053, pruned_loss=0.01228, audio_tagging_loss=0.009075, over 3045524.19 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:54:25,910 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3249826.6666666665, ans=0.125 2023-11-26 05:54:42,719 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3249960.0, ans=0.125 2023-11-26 05:54:48,357 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 487500 2023-11-26 05:54:54,636 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 6550, loss[loss=0.05965, simple_loss=0.08658, pruned_loss=0.008636, audio_tagging_loss=0.007728, over 15239.00 frames. ], tot_loss[loss=0.06711, simple_loss=0.09125, pruned_loss=0.01251, audio_tagging_loss=0.008969, over 3051161.07 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:54:54,843 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3250026.6666666665, ans=0.125 2023-11-26 05:55:12,879 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.26 vs. limit=15.0 2023-11-26 05:55:37,618 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.99 vs. limit=6.0 2023-11-26 05:55:39,382 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.986e+01 8.523e+01 8.995e+01 9.830e+01 1.214e+02, threshold=1.799e+02, percent-clipped=0.0 2023-11-26 05:55:39,930 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.11 vs. limit=12.0 2023-11-26 05:55:42,876 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 05:55:43,726 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 487550 2023-11-26 05:55:45,192 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.39 vs. limit=15.0 2023-11-26 05:55:45,988 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3250293.3333333335, ans=0.5 2023-11-26 05:55:50,079 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 6600, loss[loss=0.04863, simple_loss=0.05618, pruned_loss=0.007639, audio_tagging_loss=0.0129, over 15945.00 frames. ], tot_loss[loss=0.06698, simple_loss=0.09122, pruned_loss=0.01253, audio_tagging_loss=0.008837, over 3046059.64 frames. ], batch size: 63, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:56:05,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3250426.6666666665, ans=0.1 2023-11-26 05:56:21,511 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.97 vs. limit=15.0 2023-11-26 05:56:40,087 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 487600 2023-11-26 05:56:43,736 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3250626.6666666665, ans=0.125 2023-11-26 05:56:47,245 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 6650, loss[loss=0.0563, simple_loss=0.0812, pruned_loss=0.008774, audio_tagging_loss=0.006924, over 15328.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.08962, pruned_loss=0.01239, audio_tagging_loss=0.00881, over 3045180.39 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:56:55,005 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3250693.3333333335, ans=0.125 2023-11-26 05:56:56,554 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.94 vs. limit=15.0 2023-11-26 05:57:09,361 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3250826.6666666665, ans=0.125 2023-11-26 05:57:10,810 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.32 vs. limit=15.0 2023-11-26 05:57:14,544 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3250826.6666666665, ans=0.0 2023-11-26 05:57:27,502 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3250893.3333333335, ans=0.125 2023-11-26 05:57:27,755 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.78 vs. limit=15.0 2023-11-26 05:57:32,013 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.412e+01 8.660e+01 9.061e+01 9.694e+01 1.150e+02, threshold=1.812e+02, percent-clipped=0.0 2023-11-26 05:57:36,387 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 487650 2023-11-26 05:57:39,791 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3250960.0, ans=0.0 2023-11-26 05:57:42,720 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 6700, loss[loss=0.06852, simple_loss=0.09592, pruned_loss=0.01013, audio_tagging_loss=0.01044, over 15007.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.09055, pruned_loss=0.01248, audio_tagging_loss=0.008795, over 3044438.47 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:57:44,471 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3251026.6666666665, ans=0.125 2023-11-26 05:57:55,187 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3251093.3333333335, ans=0.125 2023-11-26 05:58:00,559 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.23 vs. limit=15.0 2023-11-26 05:58:07,956 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3251160.0, ans=0.2 2023-11-26 05:58:14,697 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_na.min_abs, batch_count=3251160.0, ans=0.02 2023-11-26 05:58:15,846 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3251226.6666666665, ans=0.125 2023-11-26 05:58:32,019 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 487700 2023-11-26 05:58:32,167 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3251293.3333333335, ans=0.2 2023-11-26 05:58:38,278 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 6750, loss[loss=0.06487, simple_loss=0.08607, pruned_loss=0.01294, audio_tagging_loss=0.008901, over 14002.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.09045, pruned_loss=0.01254, audio_tagging_loss=0.008817, over 3037380.78 frames. ], batch size: 53, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:58:39,612 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3251360.0, ans=0.5 2023-11-26 05:58:45,359 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3251360.0, ans=0.0 2023-11-26 05:59:12,321 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 05:59:15,623 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3251560.0, ans=0.1 2023-11-26 05:59:17,982 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.15 vs. limit=15.0 2023-11-26 05:59:24,314 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.344e+01 8.663e+01 9.356e+01 1.018e+02 1.599e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-26 05:59:27,722 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 487750 2023-11-26 05:59:34,839 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 6800, loss[loss=0.06263, simple_loss=0.08836, pruned_loss=0.009159, audio_tagging_loss=0.009293, over 15344.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.09036, pruned_loss=0.01243, audio_tagging_loss=0.008782, over 3035845.50 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:59:42,355 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.90 vs. limit=15.0 2023-11-26 05:59:48,903 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.86 vs. limit=15.0 2023-11-26 06:00:24,534 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 487800 2023-11-26 06:00:31,087 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 6850, loss[loss=0.06724, simple_loss=0.09863, pruned_loss=0.009261, audio_tagging_loss=0.008664, over 15559.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.08947, pruned_loss=0.01236, audio_tagging_loss=0.008703, over 3037004.81 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 06:00:51,635 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3252093.3333333335, ans=0.125 2023-11-26 06:01:01,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3252160.0, ans=0.1 2023-11-26 06:01:14,658 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3252293.3333333335, ans=0.0 2023-11-26 06:01:16,553 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.387e+01 8.578e+01 9.183e+01 9.945e+01 1.364e+02, threshold=1.837e+02, percent-clipped=0.0 2023-11-26 06:01:17,824 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3252293.3333333335, ans=0.125 2023-11-26 06:01:18,021 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.36 vs. limit=15.0 2023-11-26 06:01:19,756 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 487850 2023-11-26 06:01:26,616 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 6900, loss[loss=0.06606, simple_loss=0.08982, pruned_loss=0.01207, audio_tagging_loss=0.009081, over 15488.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.0897, pruned_loss=0.01249, audio_tagging_loss=0.008709, over 3043355.69 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 06:01:39,620 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3252426.6666666665, ans=0.0 2023-11-26 06:01:43,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3252426.6666666665, ans=0.125 2023-11-26 06:01:58,278 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3252493.3333333335, ans=0.0 2023-11-26 06:02:05,683 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3252560.0, ans=0.125 2023-11-26 06:02:09,443 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3252560.0, ans=0.0 2023-11-26 06:02:10,262 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 06:02:12,589 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3252626.6666666665, ans=0.0 2023-11-26 06:02:15,227 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3252626.6666666665, ans=0.125 2023-11-26 06:02:16,116 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 487900 2023-11-26 06:02:18,331 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3252626.6666666665, ans=0.125 2023-11-26 06:02:21,446 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.36 vs. limit=15.0 2023-11-26 06:02:22,933 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 6950, loss[loss=0.07804, simple_loss=0.1076, pruned_loss=0.01621, audio_tagging_loss=0.00801, over 14883.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.08981, pruned_loss=0.01241, audio_tagging_loss=0.0087, over 3044232.29 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 06:02:34,037 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3252760.0, ans=0.1 2023-11-26 06:02:40,381 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3252760.0, ans=0.125 2023-11-26 06:02:43,940 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.88 vs. limit=15.0 2023-11-26 06:02:57,395 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3252893.3333333335, ans=0.125 2023-11-26 06:03:01,241 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.27 vs. limit=15.0 2023-11-26 06:03:03,745 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.94 vs. limit=15.0 2023-11-26 06:03:10,032 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.635e+01 8.822e+01 9.326e+01 1.010e+02 2.073e+02, threshold=1.865e+02, percent-clipped=1.0 2023-11-26 06:03:12,281 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 487950 2023-11-26 06:03:18,660 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 7000, loss[loss=0.07946, simple_loss=0.1105, pruned_loss=0.01644, audio_tagging_loss=0.007749, over 16725.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.08981, pruned_loss=0.01239, audio_tagging_loss=0.0088, over 3046772.47 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:03:19,983 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3253026.6666666665, ans=0.025 2023-11-26 06:03:37,287 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3253093.3333333335, ans=0.125 2023-11-26 06:03:47,490 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.23 vs. limit=22.5 2023-11-26 06:04:00,352 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3253226.6666666665, ans=0.125 2023-11-26 06:04:00,454 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3253226.6666666665, ans=0.125 2023-11-26 06:04:07,802 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 488000 2023-11-26 06:04:09,088 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-488000.pt 2023-11-26 06:04:16,282 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 7050, loss[loss=0.08021, simple_loss=0.115, pruned_loss=0.01445, audio_tagging_loss=0.008284, over 16658.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.08929, pruned_loss=0.01241, audio_tagging_loss=0.008884, over 3052753.97 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:04:27,144 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3253426.6666666665, ans=0.0 2023-11-26 06:04:35,118 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3253426.6666666665, ans=0.125 2023-11-26 06:04:39,930 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=7.63 vs. limit=12.0 2023-11-26 06:04:40,696 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3253493.3333333335, ans=0.125 2023-11-26 06:05:02,585 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.363e+01 8.684e+01 9.399e+01 1.022e+02 1.192e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-26 06:05:05,268 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 488050 2023-11-26 06:05:07,154 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3253626.6666666665, ans=0.2 2023-11-26 06:05:12,705 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 7100, loss[loss=0.09218, simple_loss=0.1297, pruned_loss=0.01871, audio_tagging_loss=0.008643, over 15528.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.09012, pruned_loss=0.01253, audio_tagging_loss=0.008936, over 3050341.12 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:05:12,978 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3253693.3333333335, ans=0.1 2023-11-26 06:05:16,542 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.80 vs. limit=6.0 2023-11-26 06:05:20,836 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3253693.3333333335, ans=0.0 2023-11-26 06:05:21,928 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3253693.3333333335, ans=0.0 2023-11-26 06:05:35,535 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3253826.6666666665, ans=0.125 2023-11-26 06:05:40,819 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3253826.6666666665, ans=0.125 2023-11-26 06:05:49,259 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3253893.3333333335, ans=0.0 2023-11-26 06:06:02,061 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 488100 2023-11-26 06:06:08,414 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 7150, loss[loss=0.08469, simple_loss=0.1157, pruned_loss=0.01743, audio_tagging_loss=0.009408, over 15044.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.08956, pruned_loss=0.01246, audio_tagging_loss=0.008926, over 3047182.44 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:06:11,775 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3254026.6666666665, ans=0.05 2023-11-26 06:06:28,834 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3254160.0, ans=0.0 2023-11-26 06:06:46,413 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3254226.6666666665, ans=0.125 2023-11-26 06:06:47,591 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3254226.6666666665, ans=0.0 2023-11-26 06:06:49,755 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3254226.6666666665, ans=0.04949747468305833 2023-11-26 06:06:54,818 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.635e+01 8.934e+01 9.396e+01 1.011e+02 1.220e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-26 06:06:56,987 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 488150 2023-11-26 06:06:58,221 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3254293.3333333335, ans=0.2 2023-11-26 06:07:03,192 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 7200, loss[loss=0.05933, simple_loss=0.08184, pruned_loss=0.009482, audio_tagging_loss=0.008928, over 15468.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.08984, pruned_loss=0.01246, audio_tagging_loss=0.008998, over 3048307.94 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 06:07:03,798 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.65 vs. limit=22.5 2023-11-26 06:07:06,537 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3254360.0, ans=0.125 2023-11-26 06:07:07,539 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3254360.0, ans=0.125 2023-11-26 06:07:23,920 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3254426.6666666665, ans=0.125 2023-11-26 06:07:39,561 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3254560.0, ans=0.125 2023-11-26 06:07:48,139 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3254626.6666666665, ans=0.125 2023-11-26 06:07:52,239 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 488200 2023-11-26 06:07:59,922 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 7250, loss[loss=0.09282, simple_loss=0.1266, pruned_loss=0.02265, audio_tagging_loss=0.006861, over 15041.00 frames. ], tot_loss[loss=0.06673, simple_loss=0.09026, pruned_loss=0.01255, audio_tagging_loss=0.009048, over 3048296.32 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 06:08:04,787 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3254693.3333333335, ans=0.125 2023-11-26 06:08:13,883 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff2.min_abs, batch_count=3254760.0, ans=0.1 2023-11-26 06:08:15,045 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3254760.0, ans=0.05 2023-11-26 06:08:20,427 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3254760.0, ans=0.125 2023-11-26 06:08:24,815 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3254826.6666666665, ans=0.2 2023-11-26 06:08:35,385 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3254893.3333333335, ans=0.0 2023-11-26 06:08:47,425 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.403e+01 8.575e+01 9.064e+01 9.788e+01 1.213e+02, threshold=1.813e+02, percent-clipped=0.0 2023-11-26 06:08:49,625 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 488250 2023-11-26 06:08:56,396 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 7300, loss[loss=0.06661, simple_loss=0.08184, pruned_loss=0.01596, audio_tagging_loss=0.009727, over 13654.00 frames. ], tot_loss[loss=0.06671, simple_loss=0.09042, pruned_loss=0.0126, audio_tagging_loss=0.008906, over 3048923.85 frames. ], batch size: 52, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:08:56,676 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3255026.6666666665, ans=0.0 2023-11-26 06:09:07,203 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3255093.3333333335, ans=0.0 2023-11-26 06:09:12,746 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.71 vs. limit=15.0 2023-11-26 06:09:34,855 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.50 vs. limit=15.0 2023-11-26 06:09:39,233 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 06:09:45,379 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 488300 2023-11-26 06:09:45,611 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3255293.3333333335, ans=0.2 2023-11-26 06:09:51,714 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 7350, loss[loss=0.05226, simple_loss=0.0754, pruned_loss=0.007543, audio_tagging_loss=0.007024, over 15665.00 frames. ], tot_loss[loss=0.06674, simple_loss=0.09079, pruned_loss=0.01265, audio_tagging_loss=0.008696, over 3050869.02 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:09:57,320 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3255360.0, ans=0.0 2023-11-26 06:10:00,432 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3255360.0, ans=0.125 2023-11-26 06:10:00,509 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3255360.0, ans=0.125 2023-11-26 06:10:06,183 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3255426.6666666665, ans=0.2 2023-11-26 06:10:39,581 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.650e+01 8.543e+01 9.108e+01 9.776e+01 1.189e+02, threshold=1.822e+02, percent-clipped=0.0 2023-11-26 06:10:40,722 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 488350 2023-11-26 06:10:47,662 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 7400, loss[loss=0.05745, simple_loss=0.07671, pruned_loss=0.0105, audio_tagging_loss=0.008595, over 14730.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.08973, pruned_loss=0.01261, audio_tagging_loss=0.008699, over 3046538.43 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:10:54,769 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3255693.3333333335, ans=0.125 2023-11-26 06:11:10,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3255826.6666666665, ans=0.0 2023-11-26 06:11:18,158 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3255826.6666666665, ans=0.0 2023-11-26 06:11:18,242 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3255826.6666666665, ans=0.125 2023-11-26 06:11:27,773 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3255893.3333333335, ans=0.125 2023-11-26 06:11:37,325 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 488400 2023-11-26 06:11:44,980 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 7450, loss[loss=0.06687, simple_loss=0.09518, pruned_loss=0.01161, audio_tagging_loss=0.007669, over 16122.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.09042, pruned_loss=0.01265, audio_tagging_loss=0.008548, over 3041907.87 frames. ], batch size: 61, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:11:50,402 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3256026.6666666665, ans=0.0 2023-11-26 06:11:50,467 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3256026.6666666665, ans=0.125 2023-11-26 06:12:04,803 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.36 vs. limit=15.0 2023-11-26 06:12:30,574 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3256293.3333333335, ans=0.125 2023-11-26 06:12:32,358 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.158e+01 8.793e+01 9.296e+01 1.001e+02 1.337e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-26 06:12:32,981 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.64 vs. limit=22.5 2023-11-26 06:12:33,486 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 488450 2023-11-26 06:12:39,871 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 7500, loss[loss=0.05628, simple_loss=0.07867, pruned_loss=0.006985, audio_tagging_loss=0.009963, over 15027.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.0903, pruned_loss=0.01249, audio_tagging_loss=0.008566, over 3045915.58 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:13:14,590 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.99 vs. limit=15.0 2023-11-26 06:13:29,044 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 488500 2023-11-26 06:13:35,298 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 7550, loss[loss=0.0692, simple_loss=0.09446, pruned_loss=0.01369, audio_tagging_loss=0.008289, over 15552.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.09035, pruned_loss=0.01259, audio_tagging_loss=0.008494, over 3047224.44 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:14:02,914 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3256826.6666666665, ans=0.125 2023-11-26 06:14:07,200 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3256826.6666666665, ans=0.0 2023-11-26 06:14:14,696 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff2.min_abs, batch_count=3256893.3333333335, ans=0.1 2023-11-26 06:14:16,066 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.15 vs. limit=12.0 2023-11-26 06:14:21,890 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.96 vs. limit=12.0 2023-11-26 06:14:23,931 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.557e+01 9.000e+01 9.495e+01 1.038e+02 1.345e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-26 06:14:25,073 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 488550 2023-11-26 06:14:25,147 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3256960.0, ans=0.1 2023-11-26 06:14:31,447 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 7600, loss[loss=0.07887, simple_loss=0.1083, pruned_loss=0.01689, audio_tagging_loss=0.007835, over 15193.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.08981, pruned_loss=0.0126, audio_tagging_loss=0.008548, over 3049300.67 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 06:14:41,211 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3257026.6666666665, ans=0.2 2023-11-26 06:14:45,480 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3257093.3333333335, ans=0.2 2023-11-26 06:14:58,249 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3257160.0, ans=0.125 2023-11-26 06:15:02,057 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3257160.0, ans=0.0 2023-11-26 06:15:14,093 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3257226.6666666665, ans=0.1 2023-11-26 06:15:19,900 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3257293.3333333335, ans=0.04949747468305833 2023-11-26 06:15:21,424 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 488600 2023-11-26 06:15:27,873 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 7650, loss[loss=0.06895, simple_loss=0.09192, pruned_loss=0.01269, audio_tagging_loss=0.01029, over 16035.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.08999, pruned_loss=0.0126, audio_tagging_loss=0.008617, over 3044186.65 frames. ], batch size: 60, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 06:15:28,128 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3257360.0, ans=0.0 2023-11-26 06:15:31,313 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3257360.0, ans=0.125 2023-11-26 06:15:37,637 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3257426.6666666665, ans=0.125 2023-11-26 06:15:44,940 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.62 vs. limit=15.0 2023-11-26 06:15:48,961 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.99 vs. limit=6.0 2023-11-26 06:15:53,689 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.59 vs. limit=22.5 2023-11-26 06:15:58,208 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3257493.3333333335, ans=0.1 2023-11-26 06:15:59,336 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3257493.3333333335, ans=0.125 2023-11-26 06:16:00,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=3257560.0, ans=0.05 2023-11-26 06:16:01,384 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3257560.0, ans=0.125 2023-11-26 06:16:16,676 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.210e+01 8.718e+01 9.418e+01 1.004e+02 2.180e+02, threshold=1.884e+02, percent-clipped=1.0 2023-11-26 06:16:16,776 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 488650 2023-11-26 06:16:23,055 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 7700, loss[loss=0.08085, simple_loss=0.1099, pruned_loss=0.01812, audio_tagging_loss=0.00776, over 15982.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.09006, pruned_loss=0.01253, audio_tagging_loss=0.008613, over 3045257.14 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:16:26,048 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3257693.3333333335, ans=0.125 2023-11-26 06:16:55,071 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3257826.6666666665, ans=0.0 2023-11-26 06:17:07,088 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.07 vs. limit=12.0 2023-11-26 06:17:11,562 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 06:17:12,416 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 488700 2023-11-26 06:17:19,383 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 7750, loss[loss=0.07072, simple_loss=0.09635, pruned_loss=0.01448, audio_tagging_loss=0.008066, over 15169.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.08976, pruned_loss=0.01254, audio_tagging_loss=0.008747, over 3045128.05 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:17:20,536 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3258026.6666666665, ans=0.1 2023-11-26 06:17:48,643 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.85 vs. limit=15.0 2023-11-26 06:18:07,436 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.52 vs. limit=10.0 2023-11-26 06:18:08,730 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.436e+01 8.599e+01 9.200e+01 9.734e+01 1.299e+02, threshold=1.840e+02, percent-clipped=0.0 2023-11-26 06:18:08,837 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 488750 2023-11-26 06:18:15,113 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 7800, loss[loss=0.05487, simple_loss=0.07714, pruned_loss=0.007947, audio_tagging_loss=0.008354, over 15928.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.08999, pruned_loss=0.0126, audio_tagging_loss=0.00876, over 3043969.77 frames. ], batch size: 62, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:19:04,830 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 488800 2023-11-26 06:19:10,609 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3258693.3333333335, ans=0.2 2023-11-26 06:19:11,385 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 7850, loss[loss=0.04708, simple_loss=0.06391, pruned_loss=0.007803, audio_tagging_loss=0.007326, over 14039.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.08965, pruned_loss=0.01259, audio_tagging_loss=0.008836, over 3045961.56 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:19:17,927 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.98 vs. limit=15.0 2023-11-26 06:19:37,637 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3258826.6666666665, ans=0.035 2023-11-26 06:19:45,130 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3258893.3333333335, ans=0.0 2023-11-26 06:19:55,364 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.03 vs. limit=6.0 2023-11-26 06:19:56,185 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3258960.0, ans=0.025 2023-11-26 06:19:58,960 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3258960.0, ans=0.125 2023-11-26 06:20:00,750 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.651e+01 8.695e+01 9.194e+01 9.770e+01 1.489e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-26 06:20:00,850 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 488850 2023-11-26 06:20:01,104 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3258960.0, ans=0.0 2023-11-26 06:20:01,436 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.21 vs. limit=6.0 2023-11-26 06:20:05,781 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3258960.0, ans=0.125 2023-11-26 06:20:07,639 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 7900, loss[loss=0.07536, simple_loss=0.1125, pruned_loss=0.011, audio_tagging_loss=0.008112, over 14544.00 frames. ], tot_loss[loss=0.06695, simple_loss=0.09046, pruned_loss=0.01278, audio_tagging_loss=0.008941, over 3056965.76 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:20:07,859 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3259026.6666666665, ans=0.1 2023-11-26 06:20:08,154 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.10 vs. limit=6.0 2023-11-26 06:20:08,940 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3259026.6666666665, ans=0.125 2023-11-26 06:20:24,526 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3259093.3333333335, ans=0.125 2023-11-26 06:20:38,816 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3259160.0, ans=0.125 2023-11-26 06:20:57,368 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 488900 2023-11-26 06:21:03,743 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 7950, loss[loss=0.08151, simple_loss=0.1066, pruned_loss=0.0171, audio_tagging_loss=0.01112, over 15742.00 frames. ], tot_loss[loss=0.06681, simple_loss=0.0899, pruned_loss=0.01272, audio_tagging_loss=0.009133, over 3050143.40 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:21:08,096 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3259360.0, ans=0.1 2023-11-26 06:21:16,907 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 06:21:24,861 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3259493.3333333335, ans=0.0 2023-11-26 06:21:52,302 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.130e+01 8.753e+01 9.407e+01 1.023e+02 1.871e+02, threshold=1.881e+02, percent-clipped=1.0 2023-11-26 06:21:52,398 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 488950 2023-11-26 06:21:59,113 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 8000, loss[loss=0.04312, simple_loss=0.04863, pruned_loss=0.006362, audio_tagging_loss=0.01245, over 17044.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.08848, pruned_loss=0.01241, audio_tagging_loss=0.009259, over 3043383.13 frames. ], batch size: 67, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 06:22:12,683 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3259760.0, ans=0.1 2023-11-26 06:22:34,777 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3259893.3333333335, ans=0.125 2023-11-26 06:22:34,884 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3259893.3333333335, ans=0.0 2023-11-26 06:22:48,466 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 489000 2023-11-26 06:22:48,635 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3259960.0, ans=0.0 2023-11-26 06:22:53,098 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3259960.0, ans=0.0 2023-11-26 06:22:55,045 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 8050, loss[loss=0.06099, simple_loss=0.08677, pruned_loss=0.01021, audio_tagging_loss=0.007399, over 14607.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.0881, pruned_loss=0.01226, audio_tagging_loss=0.009215, over 3043567.79 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:23:13,520 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3260093.3333333335, ans=0.125 2023-11-26 06:23:36,701 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3260226.6666666665, ans=0.2 2023-11-26 06:23:44,643 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 489050 2023-11-26 06:23:46,128 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.345e+01 8.810e+01 9.339e+01 9.946e+01 1.266e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-26 06:23:46,391 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3260293.3333333335, ans=0.125 2023-11-26 06:23:51,456 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 8100, loss[loss=0.06497, simple_loss=0.07935, pruned_loss=0.0168, audio_tagging_loss=0.008492, over 14508.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.08887, pruned_loss=0.01241, audio_tagging_loss=0.009064, over 3042891.05 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:24:00,526 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.55 vs. limit=15.0 2023-11-26 06:24:03,460 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3260426.6666666665, ans=0.125 2023-11-26 06:24:06,759 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3260426.6666666665, ans=0.125 2023-11-26 06:24:18,322 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3260493.3333333335, ans=0.125 2023-11-26 06:24:25,293 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3260560.0, ans=0.125 2023-11-26 06:24:37,993 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.12 vs. limit=15.0 2023-11-26 06:24:40,655 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 489100 2023-11-26 06:24:46,947 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 8150, loss[loss=0.07216, simple_loss=0.09567, pruned_loss=0.0134, audio_tagging_loss=0.01093, over 14465.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.09015, pruned_loss=0.01268, audio_tagging_loss=0.00886, over 3044211.64 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:25:11,675 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3260826.6666666665, ans=0.125 2023-11-26 06:25:11,910 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.00 vs. limit=15.0 2023-11-26 06:25:35,894 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 489150 2023-11-26 06:25:37,969 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.913e+01 8.636e+01 9.236e+01 1.005e+02 1.829e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-26 06:25:38,461 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.14 vs. limit=6.0 2023-11-26 06:25:43,410 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 8200, loss[loss=0.07699, simple_loss=0.1024, pruned_loss=0.01706, audio_tagging_loss=0.008716, over 15839.00 frames. ], tot_loss[loss=0.06693, simple_loss=0.09071, pruned_loss=0.01279, audio_tagging_loss=0.00879, over 3041413.51 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 8.0 2023-11-26 06:25:44,459 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 06:25:56,433 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3261093.3333333335, ans=0.125 2023-11-26 06:25:58,490 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3261093.3333333335, ans=0.05 2023-11-26 06:26:16,810 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 06:26:31,700 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.29 vs. limit=15.0 2023-11-26 06:26:33,281 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 489200 2023-11-26 06:26:40,313 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 8250, loss[loss=0.05032, simple_loss=0.06785, pruned_loss=0.007845, audio_tagging_loss=0.00855, over 16304.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.08992, pruned_loss=0.01262, audio_tagging_loss=0.008781, over 3044796.35 frames. ], batch size: 62, lr: 1.64e-03, grad_scale: 8.0 2023-11-26 06:26:55,579 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3261426.6666666665, ans=0.125 2023-11-26 06:27:03,046 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3261493.3333333335, ans=0.125 2023-11-26 06:27:13,302 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3261560.0, ans=0.125 2023-11-26 06:27:21,368 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3261560.0, ans=0.1 2023-11-26 06:27:24,674 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3261626.6666666665, ans=0.2 2023-11-26 06:27:28,323 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.63 vs. limit=15.0 2023-11-26 06:27:29,816 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 489250 2023-11-26 06:27:31,868 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.668e+01 8.764e+01 9.523e+01 1.021e+02 1.378e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-26 06:27:36,104 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 8300, loss[loss=0.0939, simple_loss=0.135, pruned_loss=0.02037, audio_tagging_loss=0.00601, over 16021.00 frames. ], tot_loss[loss=0.06699, simple_loss=0.09098, pruned_loss=0.01276, audio_tagging_loss=0.008734, over 3049808.02 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 8.0 2023-11-26 06:27:48,684 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3261760.0, ans=0.0 2023-11-26 06:28:06,173 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3261826.6666666665, ans=0.1 2023-11-26 06:28:11,487 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3261893.3333333335, ans=0.1 2023-11-26 06:28:25,256 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 489300 2023-11-26 06:28:32,186 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 8350, loss[loss=0.06404, simple_loss=0.09328, pruned_loss=0.01074, audio_tagging_loss=0.006668, over 14599.00 frames. ], tot_loss[loss=0.06702, simple_loss=0.09141, pruned_loss=0.01271, audio_tagging_loss=0.008605, over 3050322.98 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 8.0 2023-11-26 06:28:42,945 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.74 vs. limit=10.0 2023-11-26 06:28:51,981 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.43 vs. limit=15.0 2023-11-26 06:28:56,132 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.85 vs. limit=22.5 2023-11-26 06:29:14,931 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.69 vs. limit=15.0 2023-11-26 06:29:16,691 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 06:29:21,861 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 489350 2023-11-26 06:29:23,910 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.424e+01 8.707e+01 9.107e+01 9.856e+01 1.432e+02, threshold=1.821e+02, percent-clipped=0.0 2023-11-26 06:29:28,770 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 8400, loss[loss=0.06457, simple_loss=0.08269, pruned_loss=0.01177, audio_tagging_loss=0.01146, over 15047.00 frames. ], tot_loss[loss=0.06725, simple_loss=0.09166, pruned_loss=0.01287, audio_tagging_loss=0.008549, over 3054528.40 frames. ], batch size: 59, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:29:35,571 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3262360.0, ans=0.05 2023-11-26 06:29:38,690 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3262426.6666666665, ans=0.0 2023-11-26 06:29:43,810 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3262426.6666666665, ans=0.125 2023-11-26 06:29:47,143 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 06:29:56,310 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3262493.3333333335, ans=0.0 2023-11-26 06:30:10,131 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3262560.0, ans=0.125 2023-11-26 06:30:17,895 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 489400 2023-11-26 06:30:24,463 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 8450, loss[loss=0.0504, simple_loss=0.0654, pruned_loss=0.006837, audio_tagging_loss=0.01086, over 15082.00 frames. ], tot_loss[loss=0.06713, simple_loss=0.09166, pruned_loss=0.01274, audio_tagging_loss=0.00856, over 3055605.95 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:30:29,236 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.33 vs. limit=15.0 2023-11-26 06:30:50,691 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3262826.6666666665, ans=0.125 2023-11-26 06:31:00,798 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3262893.3333333335, ans=0.0 2023-11-26 06:31:01,912 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 06:31:10,375 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3262960.0, ans=0.0 2023-11-26 06:31:13,401 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 489450 2023-11-26 06:31:15,426 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.299e+01 8.909e+01 9.451e+01 1.011e+02 1.331e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-26 06:31:20,192 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 8500, loss[loss=0.05571, simple_loss=0.07562, pruned_loss=0.009954, audio_tagging_loss=0.007943, over 14568.00 frames. ], tot_loss[loss=0.06713, simple_loss=0.09163, pruned_loss=0.01273, audio_tagging_loss=0.008577, over 3050573.81 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:31:25,096 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.41 vs. limit=15.0 2023-11-26 06:31:56,468 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.39 vs. limit=15.0 2023-11-26 06:32:05,268 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3263293.3333333335, ans=0.2 2023-11-26 06:32:05,636 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.36 vs. limit=6.0 2023-11-26 06:32:09,406 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 489500 2023-11-26 06:32:15,426 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3263360.0, ans=0.2 2023-11-26 06:32:16,189 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 8550, loss[loss=0.06245, simple_loss=0.08343, pruned_loss=0.01145, audio_tagging_loss=0.009282, over 14574.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.09032, pruned_loss=0.0125, audio_tagging_loss=0.008645, over 3041509.18 frames. ], batch size: 54, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:32:16,399 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3263360.0, ans=0.125 2023-11-26 06:32:21,774 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3263360.0, ans=0.0 2023-11-26 06:32:22,938 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3263360.0, ans=0.125 2023-11-26 06:32:35,958 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.57 vs. limit=6.0 2023-11-26 06:32:55,833 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3263560.0, ans=0.2 2023-11-26 06:33:05,634 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 489550 2023-11-26 06:33:06,793 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3263626.6666666665, ans=0.1 2023-11-26 06:33:07,664 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.400e+01 8.883e+01 9.307e+01 9.956e+01 1.247e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-26 06:33:11,020 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3263693.3333333335, ans=0.0 2023-11-26 06:33:11,968 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 8600, loss[loss=0.08346, simple_loss=0.1146, pruned_loss=0.01834, audio_tagging_loss=0.007792, over 15860.00 frames. ], tot_loss[loss=0.06649, simple_loss=0.09062, pruned_loss=0.01249, audio_tagging_loss=0.008685, over 3048727.52 frames. ], batch size: 59, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:33:41,778 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3263826.6666666665, ans=0.0 2023-11-26 06:34:00,558 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 489600 2023-11-26 06:34:02,951 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3263960.0, ans=0.125 2023-11-26 06:34:04,122 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3263960.0, ans=0.2 2023-11-26 06:34:07,034 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 8650, loss[loss=0.07261, simple_loss=0.1059, pruned_loss=0.01135, audio_tagging_loss=0.008311, over 15384.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.09013, pruned_loss=0.01247, audio_tagging_loss=0.008814, over 3044957.20 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:34:07,232 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3264026.6666666665, ans=0.125 2023-11-26 06:34:21,207 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3264093.3333333335, ans=0.1 2023-11-26 06:34:49,901 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3264226.6666666665, ans=0.1 2023-11-26 06:34:56,611 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 489650 2023-11-26 06:34:58,651 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.548e+01 8.575e+01 9.501e+01 1.015e+02 1.798e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-26 06:35:03,401 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 8700, loss[loss=0.07687, simple_loss=0.1093, pruned_loss=0.01606, audio_tagging_loss=0.006151, over 15045.00 frames. ], tot_loss[loss=0.06596, simple_loss=0.08933, pruned_loss=0.01234, audio_tagging_loss=0.008951, over 3044624.92 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:35:04,704 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3264360.0, ans=0.1 2023-11-26 06:35:06,848 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3264360.0, ans=0.125 2023-11-26 06:35:15,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3264426.6666666665, ans=0.1 2023-11-26 06:35:36,016 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3264560.0, ans=0.2 2023-11-26 06:35:36,093 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3264560.0, ans=0.125 2023-11-26 06:35:52,873 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 489700 2023-11-26 06:35:59,777 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 8750, loss[loss=0.07383, simple_loss=0.1045, pruned_loss=0.012, audio_tagging_loss=0.009569, over 14715.00 frames. ], tot_loss[loss=0.06681, simple_loss=0.09059, pruned_loss=0.01248, audio_tagging_loss=0.009043, over 3048491.37 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:36:43,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff2.min_abs, batch_count=3264960.0, ans=0.1 2023-11-26 06:36:47,073 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.00 vs. limit=15.0 2023-11-26 06:36:48,811 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 489750 2023-11-26 06:36:50,562 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.99 vs. limit=12.0 2023-11-26 06:36:50,785 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.923e+01 8.719e+01 9.577e+01 1.009e+02 1.331e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-26 06:36:52,061 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3264960.0, ans=0.125 2023-11-26 06:36:52,096 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3264960.0, ans=0.2 2023-11-26 06:36:55,071 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 8800, loss[loss=0.06797, simple_loss=0.0922, pruned_loss=0.0137, audio_tagging_loss=0.008171, over 15178.00 frames. ], tot_loss[loss=0.06746, simple_loss=0.09145, pruned_loss=0.01265, audio_tagging_loss=0.009086, over 3047677.10 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 06:36:55,526 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.49 vs. limit=15.0 2023-11-26 06:37:06,466 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3265093.3333333335, ans=0.0 2023-11-26 06:37:14,887 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.64 vs. limit=15.0 2023-11-26 06:37:37,672 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3265226.6666666665, ans=0.2 2023-11-26 06:37:44,508 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 489800 2023-11-26 06:37:44,681 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 06:37:51,521 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 8850, loss[loss=0.09844, simple_loss=0.1338, pruned_loss=0.02622, audio_tagging_loss=0.005322, over 16270.00 frames. ], tot_loss[loss=0.06739, simple_loss=0.09149, pruned_loss=0.01261, audio_tagging_loss=0.009042, over 3054888.64 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 06:38:01,796 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3265426.6666666665, ans=0.1 2023-11-26 06:38:02,763 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 06:38:03,035 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3265426.6666666665, ans=0.2 2023-11-26 06:38:09,695 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 06:38:11,749 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3265426.6666666665, ans=0.0 2023-11-26 06:38:30,567 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3265560.0, ans=0.125 2023-11-26 06:38:40,968 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 489850 2023-11-26 06:38:43,145 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3265626.6666666665, ans=0.1 2023-11-26 06:38:44,014 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.272e+01 8.713e+01 9.492e+01 1.007e+02 1.202e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-26 06:38:44,206 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 06:38:47,328 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 8900, loss[loss=0.0505, simple_loss=0.06238, pruned_loss=0.007689, audio_tagging_loss=0.01162, over 15292.00 frames. ], tot_loss[loss=0.06675, simple_loss=0.09069, pruned_loss=0.01246, audio_tagging_loss=0.008947, over 3055889.01 frames. ], batch size: 59, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:38:53,444 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3265693.3333333335, ans=0.0 2023-11-26 06:39:09,810 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 06:39:13,486 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.46 vs. limit=15.0 2023-11-26 06:39:36,869 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 489900 2023-11-26 06:39:42,401 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3266026.6666666665, ans=0.125 2023-11-26 06:39:43,102 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 8950, loss[loss=0.0815, simple_loss=0.1089, pruned_loss=0.01825, audio_tagging_loss=0.008787, over 14973.00 frames. ], tot_loss[loss=0.06674, simple_loss=0.09101, pruned_loss=0.01246, audio_tagging_loss=0.008769, over 3054685.39 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:40:10,379 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3266160.0, ans=0.0 2023-11-26 06:40:18,304 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3266226.6666666665, ans=0.0 2023-11-26 06:40:32,192 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 489950 2023-11-26 06:40:35,291 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.267e+01 8.881e+01 9.559e+01 9.968e+01 1.237e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-26 06:40:38,548 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 9000, loss[loss=0.06707, simple_loss=0.09695, pruned_loss=0.01224, audio_tagging_loss=0.006354, over 15201.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.09019, pruned_loss=0.01234, audio_tagging_loss=0.008709, over 3053242.73 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:40:38,550 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-26 06:41:01,172 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.9162, 1.6441, 3.5133, 3.0344, 2.9894, 3.0814, 2.9067, 3.2778], device='cuda:0') 2023-11-26 06:41:10,850 INFO [train_asr.py:1267] (0/4) Epoch 41, validation: loss=0.05835, simple_loss=0.05057, pruned_loss=0.005166, audio_tagging_loss=0.0279, over 4681554.00 frames. 2023-11-26 06:41:10,850 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-26 06:41:22,399 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.08 vs. limit=15.0 2023-11-26 06:41:59,938 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 490000 2023-11-26 06:42:04,613 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3266626.6666666665, ans=0.125 2023-11-26 06:42:06,616 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 9050, loss[loss=0.08099, simple_loss=0.112, pruned_loss=0.01925, audio_tagging_loss=0.005744, over 15193.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.09072, pruned_loss=0.01252, audio_tagging_loss=0.008728, over 3049903.90 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:42:14,851 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3266693.3333333335, ans=0.1 2023-11-26 06:42:19,781 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3266760.0, ans=0.125 2023-11-26 06:42:34,976 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3266826.6666666665, ans=0.2 2023-11-26 06:42:55,442 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3266960.0, ans=0.125 2023-11-26 06:42:56,362 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 490050 2023-11-26 06:42:56,448 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3266960.0, ans=0.1 2023-11-26 06:42:59,336 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.458e+01 8.756e+01 9.461e+01 1.032e+02 1.293e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-26 06:43:03,126 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 9100, loss[loss=0.05383, simple_loss=0.06908, pruned_loss=0.007714, audio_tagging_loss=0.01157, over 15334.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08902, pruned_loss=0.01227, audio_tagging_loss=0.008764, over 3056511.19 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:43:07,646 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3267026.6666666665, ans=0.04949747468305833 2023-11-26 06:43:46,063 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.06 vs. limit=10.0 2023-11-26 06:43:46,836 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3267293.3333333335, ans=0.0 2023-11-26 06:43:52,319 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.38 vs. limit=22.5 2023-11-26 06:43:52,690 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 490100 2023-11-26 06:43:58,970 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 9150, loss[loss=0.06265, simple_loss=0.08826, pruned_loss=0.01035, audio_tagging_loss=0.00817, over 15695.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08928, pruned_loss=0.01235, audio_tagging_loss=0.008661, over 3052324.19 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:43:59,143 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3267360.0, ans=0.0 2023-11-26 06:44:04,281 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3267360.0, ans=0.125 2023-11-26 06:44:20,443 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3267493.3333333335, ans=0.125 2023-11-26 06:44:23,498 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3267493.3333333335, ans=0.2 2023-11-26 06:44:31,851 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.89 vs. limit=15.0 2023-11-26 06:44:36,083 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3267560.0, ans=0.125 2023-11-26 06:44:47,818 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 490150 2023-11-26 06:44:50,862 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.417e+01 8.875e+01 9.458e+01 1.013e+02 1.353e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-26 06:44:54,043 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 9200, loss[loss=0.07204, simple_loss=0.09691, pruned_loss=0.0158, audio_tagging_loss=0.007791, over 14134.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.08984, pruned_loss=0.01243, audio_tagging_loss=0.008621, over 3047560.61 frames. ], batch size: 54, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 06:45:06,975 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3267760.0, ans=0.125 2023-11-26 06:45:07,201 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.22 vs. limit=15.0 2023-11-26 06:45:37,074 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.70 vs. limit=8.0 2023-11-26 06:45:38,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3267960.0, ans=0.125 2023-11-26 06:45:39,251 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3267960.0, ans=0.125 2023-11-26 06:45:43,802 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 490200 2023-11-26 06:45:51,042 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 9250, loss[loss=0.05892, simple_loss=0.08049, pruned_loss=0.009732, audio_tagging_loss=0.008945, over 15327.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.08974, pruned_loss=0.01241, audio_tagging_loss=0.008607, over 3049614.54 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 06:45:53,393 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3268026.6666666665, ans=0.1 2023-11-26 06:46:07,134 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.19 vs. limit=15.0 2023-11-26 06:46:25,654 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3268226.6666666665, ans=0.125 2023-11-26 06:46:39,737 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 490250 2023-11-26 06:46:41,512 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3268293.3333333335, ans=0.07 2023-11-26 06:46:43,458 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.386e+01 8.603e+01 9.080e+01 9.924e+01 1.383e+02, threshold=1.816e+02, percent-clipped=0.0 2023-11-26 06:46:46,745 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 9300, loss[loss=0.06652, simple_loss=0.08894, pruned_loss=0.01226, audio_tagging_loss=0.009791, over 14545.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08895, pruned_loss=0.01221, audio_tagging_loss=0.008621, over 3047959.51 frames. ], batch size: 54, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 06:47:01,200 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.26 vs. limit=12.0 2023-11-26 06:47:11,060 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 06:47:22,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3268560.0, ans=0.0 2023-11-26 06:47:30,143 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3268626.6666666665, ans=0.125 2023-11-26 06:47:35,439 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 490300 2023-11-26 06:47:41,749 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 9350, loss[loss=0.0608, simple_loss=0.07625, pruned_loss=0.01113, audio_tagging_loss=0.01154, over 14357.00 frames. ], tot_loss[loss=0.06523, simple_loss=0.08868, pruned_loss=0.0122, audio_tagging_loss=0.008684, over 3036433.19 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 06:48:09,995 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.46 vs. limit=15.0 2023-11-26 06:48:10,668 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3268826.6666666665, ans=0.125 2023-11-26 06:48:18,058 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3268893.3333333335, ans=0.125 2023-11-26 06:48:31,062 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 490350 2023-11-26 06:48:34,655 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.576e+01 8.924e+01 9.559e+01 1.022e+02 1.389e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-26 06:48:37,928 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 9400, loss[loss=0.05907, simple_loss=0.07691, pruned_loss=0.01077, audio_tagging_loss=0.00985, over 15654.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.08966, pruned_loss=0.01243, audio_tagging_loss=0.008784, over 3043327.59 frames. ], batch size: 62, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 06:48:43,981 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3269026.6666666665, ans=0.125 2023-11-26 06:48:44,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3269026.6666666665, ans=0.0 2023-11-26 06:48:47,359 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.14 vs. limit=22.5 2023-11-26 06:48:48,601 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3269093.3333333335, ans=0.0 2023-11-26 06:48:56,066 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3269093.3333333335, ans=0.125 2023-11-26 06:49:07,866 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3269160.0, ans=0.0 2023-11-26 06:49:27,317 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 490400 2023-11-26 06:49:28,782 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3269293.3333333335, ans=0.125 2023-11-26 06:49:32,332 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 06:49:33,544 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3269360.0, ans=0.0 2023-11-26 06:49:34,413 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 9450, loss[loss=0.07139, simple_loss=0.1046, pruned_loss=0.0113, audio_tagging_loss=0.007813, over 15763.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.08944, pruned_loss=0.01252, audio_tagging_loss=0.008963, over 3038758.66 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 06:49:34,703 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3269360.0, ans=0.125 2023-11-26 06:50:23,256 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 490450 2023-11-26 06:50:26,315 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.546e+01 8.849e+01 9.435e+01 1.031e+02 1.248e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-26 06:50:29,511 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 9500, loss[loss=0.05328, simple_loss=0.06678, pruned_loss=0.006754, audio_tagging_loss=0.01313, over 14507.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.08978, pruned_loss=0.01247, audio_tagging_loss=0.008915, over 3045464.54 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 06:50:41,196 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3269760.0, ans=0.0 2023-11-26 06:51:14,589 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3269960.0, ans=0.125 2023-11-26 06:51:15,791 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3269960.0, ans=0.125 2023-11-26 06:51:18,663 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 490500 2023-11-26 06:51:18,823 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3269960.0, ans=0.09899494936611666 2023-11-26 06:51:22,846 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.93 vs. limit=15.0 2023-11-26 06:51:25,483 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 9550, loss[loss=0.06312, simple_loss=0.07769, pruned_loss=0.01322, audio_tagging_loss=0.01105, over 14738.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.08967, pruned_loss=0.01237, audio_tagging_loss=0.008991, over 3052089.85 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 06:51:49,111 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.91 vs. limit=15.0 2023-11-26 06:51:56,088 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3270160.0, ans=0.2 2023-11-26 06:51:59,399 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3270226.6666666665, ans=0.1 2023-11-26 06:52:01,683 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.17 vs. limit=22.5 2023-11-26 06:52:15,542 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 490550 2023-11-26 06:52:18,589 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.011e+01 8.931e+01 9.591e+01 1.034e+02 1.211e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-26 06:52:22,446 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 9600, loss[loss=0.06355, simple_loss=0.07999, pruned_loss=0.01374, audio_tagging_loss=0.009813, over 15017.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.08928, pruned_loss=0.01246, audio_tagging_loss=0.009038, over 3049067.22 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 06:52:45,324 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.01 vs. limit=10.0 2023-11-26 06:53:02,709 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3270560.0, ans=10.0 2023-11-26 06:53:05,354 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3270560.0, ans=0.1 2023-11-26 06:53:11,611 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 490600 2023-11-26 06:53:18,179 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 9650, loss[loss=0.06866, simple_loss=0.09349, pruned_loss=0.01236, audio_tagging_loss=0.009553, over 16120.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.08932, pruned_loss=0.01245, audio_tagging_loss=0.00903, over 3049077.93 frames. ], batch size: 60, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 06:53:18,857 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.69 vs. limit=22.5 2023-11-26 06:53:47,135 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3270826.6666666665, ans=0.0 2023-11-26 06:53:48,006 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3270826.6666666665, ans=0.1 2023-11-26 06:53:49,074 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3270826.6666666665, ans=0.1 2023-11-26 06:53:59,777 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3270893.3333333335, ans=0.125 2023-11-26 06:54:06,908 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.92 vs. limit=15.0 2023-11-26 06:54:07,200 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 490650 2023-11-26 06:54:10,212 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.227e+01 8.629e+01 9.120e+01 1.007e+02 1.405e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-26 06:54:13,991 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 9700, loss[loss=0.07001, simple_loss=0.09238, pruned_loss=0.01752, audio_tagging_loss=0.006291, over 14697.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.08914, pruned_loss=0.01249, audio_tagging_loss=0.00896, over 3040964.85 frames. ], batch size: 54, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 06:54:30,872 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3271093.3333333335, ans=0.125 2023-11-26 06:54:32,982 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3271093.3333333335, ans=0.0 2023-11-26 06:54:43,598 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3271160.0, ans=0.0 2023-11-26 06:54:50,034 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3271226.6666666665, ans=0.125 2023-11-26 06:54:50,304 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.84 vs. limit=15.0 2023-11-26 06:55:00,735 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.87 vs. limit=15.0 2023-11-26 06:55:02,303 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3271293.3333333335, ans=0.125 2023-11-26 06:55:03,195 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 490700 2023-11-26 06:55:10,694 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 9750, loss[loss=0.0689, simple_loss=0.09272, pruned_loss=0.01228, audio_tagging_loss=0.01026, over 15498.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.08903, pruned_loss=0.01231, audio_tagging_loss=0.008837, over 3051207.40 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 06:55:10,873 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3271360.0, ans=0.1 2023-11-26 06:55:17,964 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3271360.0, ans=0.125 2023-11-26 06:55:23,508 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3271426.6666666665, ans=0.0 2023-11-26 06:55:28,782 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3271426.6666666665, ans=0.0 2023-11-26 06:55:59,908 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 490750 2023-11-26 06:56:01,635 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.12 vs. limit=15.0 2023-11-26 06:56:03,973 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.324e+01 8.706e+01 9.282e+01 1.012e+02 1.180e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-26 06:56:04,241 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3271626.6666666665, ans=0.0 2023-11-26 06:56:06,096 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 9800, loss[loss=0.06911, simple_loss=0.09537, pruned_loss=0.01369, audio_tagging_loss=0.007731, over 15051.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.08983, pruned_loss=0.01264, audio_tagging_loss=0.008765, over 3048194.98 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:56:10,604 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3271693.3333333335, ans=0.125 2023-11-26 06:56:10,621 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3271693.3333333335, ans=0.2 2023-11-26 06:56:50,966 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3271960.0, ans=0.0 2023-11-26 06:56:55,064 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 06:56:55,091 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 490800 2023-11-26 06:56:57,702 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3271960.0, ans=0.125 2023-11-26 06:57:01,816 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 9850, loss[loss=0.06984, simple_loss=0.09818, pruned_loss=0.01039, audio_tagging_loss=0.01037, over 14278.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.08982, pruned_loss=0.01263, audio_tagging_loss=0.008709, over 3048440.76 frames. ], batch size: 53, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:57:03,154 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 06:57:19,617 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3272093.3333333335, ans=0.125 2023-11-26 06:57:25,651 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3272160.0, ans=0.125 2023-11-26 06:57:26,543 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3272160.0, ans=0.0 2023-11-26 06:57:33,990 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3272160.0, ans=0.0 2023-11-26 06:57:40,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3272226.6666666665, ans=0.0 2023-11-26 06:57:40,703 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.62 vs. limit=15.0 2023-11-26 06:57:48,605 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3272293.3333333335, ans=0.125 2023-11-26 06:57:48,607 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3272293.3333333335, ans=0.125 2023-11-26 06:57:51,701 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 490850 2023-11-26 06:57:56,501 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.954e+01 8.658e+01 9.556e+01 1.029e+02 1.537e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-26 06:57:58,702 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 9900, loss[loss=0.07135, simple_loss=0.102, pruned_loss=0.0111, audio_tagging_loss=0.009247, over 15974.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.0903, pruned_loss=0.0126, audio_tagging_loss=0.008811, over 3039918.54 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:58:06,406 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3272360.0, ans=0.125 2023-11-26 06:58:18,050 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3272426.6666666665, ans=0.125 2023-11-26 06:58:34,643 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3272560.0, ans=0.125 2023-11-26 06:58:48,440 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 490900 2023-11-26 06:58:55,383 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 9950, loss[loss=0.06743, simple_loss=0.09117, pruned_loss=0.01453, audio_tagging_loss=0.00732, over 15582.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.08934, pruned_loss=0.01246, audio_tagging_loss=0.008775, over 3046728.61 frames. ], batch size: 59, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:59:09,407 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3272760.0, ans=0.125 2023-11-26 06:59:10,422 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3272760.0, ans=0.0 2023-11-26 06:59:10,689 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.51 vs. limit=10.0 2023-11-26 06:59:16,178 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3272826.6666666665, ans=0.0 2023-11-26 06:59:27,905 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3272893.3333333335, ans=0.07 2023-11-26 06:59:29,324 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.92 vs. limit=22.5 2023-11-26 06:59:44,405 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 490950 2023-11-26 06:59:48,561 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.315e+01 8.546e+01 9.420e+01 1.008e+02 1.364e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-26 06:59:50,741 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 10000, loss[loss=0.07499, simple_loss=0.1043, pruned_loss=0.01597, audio_tagging_loss=0.006861, over 17186.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.08912, pruned_loss=0.01239, audio_tagging_loss=0.008752, over 3044100.60 frames. ], batch size: 64, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:00:10,489 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 07:00:12,709 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3273160.0, ans=0.125 2023-11-26 07:00:13,706 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3273160.0, ans=0.125 2023-11-26 07:00:24,492 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.04 vs. limit=12.0 2023-11-26 07:00:28,272 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3273226.6666666665, ans=0.0 2023-11-26 07:00:40,158 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 491000 2023-11-26 07:00:44,943 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3273293.3333333335, ans=0.0 2023-11-26 07:00:46,549 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3273360.0, ans=0.0 2023-11-26 07:00:47,336 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 10050, loss[loss=0.05477, simple_loss=0.06635, pruned_loss=0.0113, audio_tagging_loss=0.0103, over 15467.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.0893, pruned_loss=0.01235, audio_tagging_loss=0.008755, over 3042939.04 frames. ], batch size: 61, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:00:56,563 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3273360.0, ans=0.0 2023-11-26 07:01:01,208 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.68 vs. limit=15.0 2023-11-26 07:01:01,936 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3273426.6666666665, ans=0.0 2023-11-26 07:01:16,349 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3273493.3333333335, ans=0.125 2023-11-26 07:01:22,305 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3273560.0, ans=0.125 2023-11-26 07:01:36,854 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 491050 2023-11-26 07:01:40,223 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3273626.6666666665, ans=0.0 2023-11-26 07:01:40,727 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.22 vs. limit=15.0 2023-11-26 07:01:41,059 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.352e+01 8.461e+01 9.073e+01 9.880e+01 1.259e+02, threshold=1.815e+02, percent-clipped=0.0 2023-11-26 07:01:43,275 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 10100, loss[loss=0.07004, simple_loss=0.09734, pruned_loss=0.0119, audio_tagging_loss=0.00946, over 15053.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08885, pruned_loss=0.01217, audio_tagging_loss=0.008773, over 3046534.46 frames. ], batch size: 59, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:02:01,009 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3273760.0, ans=0.0 2023-11-26 07:02:09,024 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3273826.6666666665, ans=0.1 2023-11-26 07:02:14,916 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3273826.6666666665, ans=0.125 2023-11-26 07:02:26,053 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3273893.3333333335, ans=0.2 2023-11-26 07:02:28,944 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 07:02:31,894 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3273960.0, ans=0.125 2023-11-26 07:02:32,760 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 491100 2023-11-26 07:02:39,074 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 10150, loss[loss=0.07683, simple_loss=0.11, pruned_loss=0.01481, audio_tagging_loss=0.007018, over 15779.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08896, pruned_loss=0.01209, audio_tagging_loss=0.008771, over 3040908.21 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:03:06,045 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 07:03:15,208 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3274226.6666666665, ans=0.125 2023-11-26 07:03:15,261 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3274226.6666666665, ans=0.2 2023-11-26 07:03:15,786 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.17 vs. limit=15.0 2023-11-26 07:03:28,325 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 491150 2023-11-26 07:03:32,397 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.543e+01 8.834e+01 9.375e+01 1.026e+02 1.327e+02, threshold=1.875e+02, percent-clipped=0.0 2023-11-26 07:03:34,521 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 10200, loss[loss=0.08008, simple_loss=0.1069, pruned_loss=0.01811, audio_tagging_loss=0.008535, over 15072.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.08967, pruned_loss=0.01236, audio_tagging_loss=0.008857, over 3049357.58 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:03:41,597 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3274360.0, ans=0.125 2023-11-26 07:03:46,373 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3274426.6666666665, ans=0.0 2023-11-26 07:03:49,476 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3274426.6666666665, ans=0.0 2023-11-26 07:03:53,654 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 07:03:55,600 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 07:04:00,569 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3274493.3333333335, ans=0.1 2023-11-26 07:04:08,559 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.50 vs. limit=10.0 2023-11-26 07:04:15,108 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3274560.0, ans=0.1 2023-11-26 07:04:23,780 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 491200 2023-11-26 07:04:30,705 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 10250, loss[loss=0.05581, simple_loss=0.06693, pruned_loss=0.009295, audio_tagging_loss=0.01305, over 15159.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.08945, pruned_loss=0.01237, audio_tagging_loss=0.008922, over 3051912.03 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:04:32,981 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3274693.3333333335, ans=0.125 2023-11-26 07:04:40,415 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3274760.0, ans=0.125 2023-11-26 07:04:54,179 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff3.min_abs, batch_count=3274826.6666666665, ans=0.2 2023-11-26 07:05:19,435 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 491250 2023-11-26 07:05:23,637 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.179e+01 8.938e+01 9.745e+01 1.064e+02 1.415e+02, threshold=1.949e+02, percent-clipped=0.0 2023-11-26 07:05:23,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3274960.0, ans=0.125 2023-11-26 07:05:25,357 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.89 vs. limit=6.0 2023-11-26 07:05:25,434 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.98 vs. limit=15.0 2023-11-26 07:05:25,861 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 10300, loss[loss=0.07774, simple_loss=0.1076, pruned_loss=0.01401, audio_tagging_loss=0.009909, over 16163.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.08953, pruned_loss=0.01246, audio_tagging_loss=0.009008, over 3055797.79 frames. ], batch size: 61, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:05:26,048 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3275026.6666666665, ans=0.125 2023-11-26 07:05:33,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3275026.6666666665, ans=0.2 2023-11-26 07:05:34,561 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3275026.6666666665, ans=0.2 2023-11-26 07:05:42,093 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3275093.3333333335, ans=0.2 2023-11-26 07:06:06,296 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3275226.6666666665, ans=0.2 2023-11-26 07:06:15,254 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 491300 2023-11-26 07:06:22,374 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 10350, loss[loss=0.08575, simple_loss=0.1139, pruned_loss=0.02018, audio_tagging_loss=0.008614, over 14751.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.08958, pruned_loss=0.01243, audio_tagging_loss=0.009062, over 3048145.83 frames. ], batch size: 54, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:06:30,433 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3275360.0, ans=0.0 2023-11-26 07:06:30,531 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3275360.0, ans=0.1 2023-11-26 07:06:31,553 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3275360.0, ans=0.07 2023-11-26 07:06:35,113 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 07:06:42,658 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3275426.6666666665, ans=0.0 2023-11-26 07:07:08,687 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3275626.6666666665, ans=0.1 2023-11-26 07:07:11,735 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 491350 2023-11-26 07:07:16,381 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.482e+01 8.783e+01 9.372e+01 1.013e+02 2.774e+02, threshold=1.874e+02, percent-clipped=1.0 2023-11-26 07:07:18,538 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 10400, loss[loss=0.06349, simple_loss=0.08894, pruned_loss=0.01089, audio_tagging_loss=0.008127, over 15737.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.08947, pruned_loss=0.0123, audio_tagging_loss=0.009148, over 3053216.47 frames. ], batch size: 59, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:07:25,175 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3275693.3333333335, ans=0.125 2023-11-26 07:07:36,782 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3275760.0, ans=0.125 2023-11-26 07:07:43,184 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3275826.6666666665, ans=0.0 2023-11-26 07:07:45,308 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3275826.6666666665, ans=0.035 2023-11-26 07:07:46,381 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3275826.6666666665, ans=0.0 2023-11-26 07:07:49,623 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3275826.6666666665, ans=0.0 2023-11-26 07:07:52,450 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.53 vs. limit=15.0 2023-11-26 07:08:07,528 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 491400 2023-11-26 07:08:14,163 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 10450, loss[loss=0.05344, simple_loss=0.07238, pruned_loss=0.007282, audio_tagging_loss=0.009966, over 15648.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.08959, pruned_loss=0.01227, audio_tagging_loss=0.009129, over 3043644.72 frames. ], batch size: 62, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:08:20,748 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3276026.6666666665, ans=0.0 2023-11-26 07:08:27,254 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3276093.3333333335, ans=0.1 2023-11-26 07:08:34,071 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3276093.3333333335, ans=0.1 2023-11-26 07:08:35,446 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.15 vs. limit=15.0 2023-11-26 07:08:42,059 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3276160.0, ans=0.125 2023-11-26 07:08:49,450 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3276226.6666666665, ans=0.125 2023-11-26 07:09:03,206 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 491450 2023-11-26 07:09:07,822 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.429e+01 8.707e+01 9.260e+01 9.868e+01 1.345e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-26 07:09:10,554 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 10500, loss[loss=0.07207, simple_loss=0.09855, pruned_loss=0.01398, audio_tagging_loss=0.00882, over 14475.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.08993, pruned_loss=0.01245, audio_tagging_loss=0.008992, over 3042507.72 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:09:10,616 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3276360.0, ans=0.125 2023-11-26 07:09:29,736 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3276426.6666666665, ans=0.0 2023-11-26 07:09:30,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3276426.6666666665, ans=0.0 2023-11-26 07:09:37,104 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3276493.3333333335, ans=0.125 2023-11-26 07:09:44,883 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.59 vs. limit=15.0 2023-11-26 07:09:57,500 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 07:09:59,892 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 491500 2023-11-26 07:10:06,819 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 10550, loss[loss=0.09159, simple_loss=0.1218, pruned_loss=0.02332, audio_tagging_loss=0.007381, over 15818.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.08972, pruned_loss=0.01237, audio_tagging_loss=0.008869, over 3052272.69 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 07:10:13,368 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3276693.3333333335, ans=0.0 2023-11-26 07:10:21,968 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3276760.0, ans=0.125 2023-11-26 07:10:25,045 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3276760.0, ans=0.125 2023-11-26 07:10:44,715 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3276893.3333333335, ans=0.125 2023-11-26 07:10:55,642 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 491550 2023-11-26 07:11:00,813 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.273e+01 8.562e+01 9.260e+01 9.916e+01 1.260e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-26 07:11:01,903 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 10600, loss[loss=0.03963, simple_loss=0.04674, pruned_loss=0.006764, audio_tagging_loss=0.009493, over 14314.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.09003, pruned_loss=0.01232, audio_tagging_loss=0.008736, over 3050031.99 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 07:11:06,295 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3277026.6666666665, ans=0.0 2023-11-26 07:11:18,187 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3277093.3333333335, ans=0.5 2023-11-26 07:11:34,824 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3277226.6666666665, ans=0.2 2023-11-26 07:11:40,134 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3277226.6666666665, ans=10.0 2023-11-26 07:11:50,593 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 491600 2023-11-26 07:11:53,152 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3277293.3333333335, ans=0.95 2023-11-26 07:11:57,750 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 10650, loss[loss=0.05654, simple_loss=0.08164, pruned_loss=0.008489, audio_tagging_loss=0.007234, over 14664.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.08959, pruned_loss=0.01224, audio_tagging_loss=0.008711, over 3046942.74 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 07:12:01,041 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=3277360.0, ans=0.025 2023-11-26 07:12:13,419 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3277426.6666666665, ans=0.125 2023-11-26 07:12:15,491 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3277426.6666666665, ans=0.125 2023-11-26 07:12:26,026 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3277493.3333333335, ans=0.1 2023-11-26 07:12:33,513 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.16 vs. limit=22.5 2023-11-26 07:12:37,441 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3277560.0, ans=0.125 2023-11-26 07:12:42,141 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3277626.6666666665, ans=0.2 2023-11-26 07:12:46,772 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 491650 2023-11-26 07:12:48,543 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3277626.6666666665, ans=0.125 2023-11-26 07:12:50,537 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3277626.6666666665, ans=0.2 2023-11-26 07:12:53,545 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.823e+01 8.757e+01 9.487e+01 1.015e+02 1.210e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-26 07:12:53,570 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 10700, loss[loss=0.09096, simple_loss=0.1211, pruned_loss=0.01863, audio_tagging_loss=0.01179, over 16341.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08916, pruned_loss=0.01223, audio_tagging_loss=0.008773, over 3040835.28 frames. ], batch size: 60, lr: 1.64e-03, grad_scale: 8.0 2023-11-26 07:13:00,765 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3277693.3333333335, ans=0.1 2023-11-26 07:13:04,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3277760.0, ans=0.125 2023-11-26 07:13:14,627 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3277826.6666666665, ans=0.125 2023-11-26 07:13:24,781 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 07:13:30,687 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3277893.3333333335, ans=0.125 2023-11-26 07:13:42,683 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 491700 2023-11-26 07:13:48,926 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 10750, loss[loss=0.05196, simple_loss=0.06263, pruned_loss=0.01246, audio_tagging_loss=0.008195, over 13896.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.08948, pruned_loss=0.01228, audio_tagging_loss=0.00869, over 3046084.65 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 8.0 2023-11-26 07:13:56,622 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3278026.6666666665, ans=0.125 2023-11-26 07:14:01,866 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3278093.3333333335, ans=0.125 2023-11-26 07:14:01,945 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3278093.3333333335, ans=0.125 2023-11-26 07:14:13,520 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.72 vs. limit=10.0 2023-11-26 07:14:21,102 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3278160.0, ans=0.0 2023-11-26 07:14:28,568 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3278226.6666666665, ans=0.125 2023-11-26 07:14:32,683 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3278293.3333333335, ans=0.125 2023-11-26 07:14:32,690 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3278293.3333333335, ans=0.0 2023-11-26 07:14:33,765 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3278293.3333333335, ans=0.125 2023-11-26 07:14:37,782 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 491750 2023-11-26 07:14:44,172 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.353e+01 8.438e+01 9.296e+01 1.012e+02 1.543e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-26 07:14:44,210 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 10800, loss[loss=0.08632, simple_loss=0.1107, pruned_loss=0.01971, audio_tagging_loss=0.01126, over 15613.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.08995, pruned_loss=0.01246, audio_tagging_loss=0.008714, over 3051506.52 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 07:14:56,766 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3278426.6666666665, ans=0.125 2023-11-26 07:15:11,113 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3278493.3333333335, ans=0.1 2023-11-26 07:15:31,618 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3278626.6666666665, ans=0.125 2023-11-26 07:15:31,987 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.32 vs. limit=15.0 2023-11-26 07:15:33,511 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 491800 2023-11-26 07:15:36,160 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3278626.6666666665, ans=0.1 2023-11-26 07:15:41,209 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 10850, loss[loss=0.06225, simple_loss=0.08305, pruned_loss=0.01221, audio_tagging_loss=0.008521, over 15782.00 frames. ], tot_loss[loss=0.06662, simple_loss=0.09063, pruned_loss=0.01258, audio_tagging_loss=0.008725, over 3049264.69 frames. ], batch size: 60, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 07:15:41,625 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.36 vs. limit=15.0 2023-11-26 07:15:48,276 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3278693.3333333335, ans=0.1 2023-11-26 07:15:53,602 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 07:16:06,118 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3278826.6666666665, ans=0.2 2023-11-26 07:16:18,407 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3278893.3333333335, ans=0.125 2023-11-26 07:16:22,452 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.10 vs. limit=12.0 2023-11-26 07:16:23,487 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.15 vs. limit=12.0 2023-11-26 07:16:30,335 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 491850 2023-11-26 07:16:33,483 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 07:16:36,666 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.345e+01 8.766e+01 9.451e+01 1.013e+02 1.235e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-26 07:16:36,692 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 10900, loss[loss=0.06954, simple_loss=0.09159, pruned_loss=0.01319, audio_tagging_loss=0.01056, over 15153.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.08962, pruned_loss=0.01242, audio_tagging_loss=0.008883, over 3052050.68 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 07:16:52,698 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3279093.3333333335, ans=0.0 2023-11-26 07:17:04,257 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3279160.0, ans=0.125 2023-11-26 07:17:24,402 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3279293.3333333335, ans=0.125 2023-11-26 07:17:24,429 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3279293.3333333335, ans=0.125 2023-11-26 07:17:25,291 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 491900 2023-11-26 07:17:29,836 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.11 vs. limit=15.0 2023-11-26 07:17:31,491 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 10950, loss[loss=0.05946, simple_loss=0.07182, pruned_loss=0.01171, audio_tagging_loss=0.01183, over 14592.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.09013, pruned_loss=0.01251, audio_tagging_loss=0.008843, over 3054969.78 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 07:17:35,937 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3279360.0, ans=0.0 2023-11-26 07:17:38,586 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3279360.0, ans=0.125 2023-11-26 07:17:40,782 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 07:17:48,696 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3279426.6666666665, ans=0.0 2023-11-26 07:17:52,402 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3279426.6666666665, ans=0.125 2023-11-26 07:17:58,232 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 07:18:01,346 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3279493.3333333335, ans=0.125 2023-11-26 07:18:03,439 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3279493.3333333335, ans=0.125 2023-11-26 07:18:20,747 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 491950 2023-11-26 07:18:27,602 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.477e+01 8.778e+01 9.414e+01 1.024e+02 1.293e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-26 07:18:27,631 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 11000, loss[loss=0.05375, simple_loss=0.06994, pruned_loss=0.008447, audio_tagging_loss=0.01033, over 14465.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.08972, pruned_loss=0.01248, audio_tagging_loss=0.008854, over 3047716.18 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 07:18:31,116 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3279693.3333333335, ans=0.1 2023-11-26 07:18:33,154 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 07:18:36,597 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 07:18:40,435 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3279760.0, ans=0.0 2023-11-26 07:18:42,590 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3279760.0, ans=0.1 2023-11-26 07:18:47,790 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3279760.0, ans=0.125 2023-11-26 07:19:12,201 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3279960.0, ans=0.0 2023-11-26 07:19:16,813 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 492000 2023-11-26 07:19:18,079 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-492000.pt 2023-11-26 07:19:25,741 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 11050, loss[loss=0.07188, simple_loss=0.09563, pruned_loss=0.013, audio_tagging_loss=0.01107, over 15742.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.08952, pruned_loss=0.01237, audio_tagging_loss=0.008941, over 3044866.14 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 07:19:27,206 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=3280026.6666666665, ans=15.0 2023-11-26 07:19:27,207 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.90 vs. limit=15.0 2023-11-26 07:19:29,383 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.63 vs. limit=15.0 2023-11-26 07:19:33,360 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3280026.6666666665, ans=0.125 2023-11-26 07:19:44,112 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3280093.3333333335, ans=0.125 2023-11-26 07:20:04,702 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3280226.6666666665, ans=0.1 2023-11-26 07:20:14,522 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 492050 2023-11-26 07:20:18,766 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3280293.3333333335, ans=0.1 2023-11-26 07:20:18,893 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3280293.3333333335, ans=0.125 2023-11-26 07:20:20,693 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.475e+01 8.875e+01 9.418e+01 1.004e+02 1.333e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-26 07:20:20,721 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 11100, loss[loss=0.07123, simple_loss=0.09581, pruned_loss=0.01501, audio_tagging_loss=0.008312, over 15047.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.08985, pruned_loss=0.01246, audio_tagging_loss=0.009042, over 3049226.81 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 07:20:27,178 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3280360.0, ans=0.0 2023-11-26 07:20:35,205 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3280426.6666666665, ans=0.09899494936611666 2023-11-26 07:20:51,734 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3280493.3333333335, ans=0.2 2023-11-26 07:21:01,203 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3280560.0, ans=0.125 2023-11-26 07:21:03,273 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3280560.0, ans=0.125 2023-11-26 07:21:05,359 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3280626.6666666665, ans=0.07 2023-11-26 07:21:09,388 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 492100 2023-11-26 07:21:16,282 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 11150, loss[loss=0.08238, simple_loss=0.1172, pruned_loss=0.01792, audio_tagging_loss=0.00589, over 15547.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.08998, pruned_loss=0.01257, audio_tagging_loss=0.009106, over 3052346.04 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 07:21:16,945 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.05 vs. limit=15.0 2023-11-26 07:21:28,270 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3280760.0, ans=0.1 2023-11-26 07:21:29,321 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3280760.0, ans=0.125 2023-11-26 07:21:41,677 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3280826.6666666665, ans=0.125 2023-11-26 07:21:48,415 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.85 vs. limit=15.0 2023-11-26 07:21:57,968 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3280893.3333333335, ans=0.2 2023-11-26 07:22:00,243 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3280960.0, ans=0.5 2023-11-26 07:22:05,965 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 492150 2023-11-26 07:22:10,923 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3280960.0, ans=0.125 2023-11-26 07:22:12,759 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.732e+01 8.937e+01 9.375e+01 1.012e+02 1.316e+02, threshold=1.875e+02, percent-clipped=0.0 2023-11-26 07:22:12,787 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 11200, loss[loss=0.06077, simple_loss=0.08531, pruned_loss=0.009606, audio_tagging_loss=0.008509, over 16945.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.08966, pruned_loss=0.01242, audio_tagging_loss=0.009124, over 3054266.86 frames. ], batch size: 63, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:22:17,656 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3281026.6666666665, ans=0.0 2023-11-26 07:22:22,999 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3281093.3333333335, ans=0.125 2023-11-26 07:22:32,496 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3281093.3333333335, ans=0.125 2023-11-26 07:22:37,251 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3281160.0, ans=0.5 2023-11-26 07:22:40,385 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3281160.0, ans=0.1 2023-11-26 07:22:49,353 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3281226.6666666665, ans=0.0 2023-11-26 07:22:56,423 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3281293.3333333335, ans=0.125 2023-11-26 07:23:01,946 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 492200 2023-11-26 07:23:08,503 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 11250, loss[loss=0.06493, simple_loss=0.08515, pruned_loss=0.01347, audio_tagging_loss=0.00888, over 16393.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.08906, pruned_loss=0.0123, audio_tagging_loss=0.009119, over 3051675.01 frames. ], batch size: 62, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:23:23,790 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3281426.6666666665, ans=0.0 2023-11-26 07:23:29,132 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.98 vs. limit=10.0 2023-11-26 07:23:35,527 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3281493.3333333335, ans=0.0 2023-11-26 07:23:46,136 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.50 vs. limit=15.0 2023-11-26 07:23:53,472 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.37 vs. limit=15.0 2023-11-26 07:23:57,225 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 492250 2023-11-26 07:24:04,030 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.238e+01 8.640e+01 9.467e+01 1.012e+02 1.426e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-26 07:24:04,058 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 11300, loss[loss=0.07757, simple_loss=0.09631, pruned_loss=0.01643, audio_tagging_loss=0.01299, over 15251.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.08995, pruned_loss=0.0126, audio_tagging_loss=0.008936, over 3051185.00 frames. ], batch size: 61, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:24:04,378 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3281693.3333333335, ans=0.2 2023-11-26 07:24:30,246 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3281826.6666666665, ans=0.0 2023-11-26 07:24:42,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3281893.3333333335, ans=0.125 2023-11-26 07:24:49,773 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3281960.0, ans=0.125 2023-11-26 07:24:53,316 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 492300 2023-11-26 07:25:00,216 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 11350, loss[loss=0.05705, simple_loss=0.06923, pruned_loss=0.01158, audio_tagging_loss=0.01086, over 14479.00 frames. ], tot_loss[loss=0.06644, simple_loss=0.0901, pruned_loss=0.01253, audio_tagging_loss=0.008859, over 3049476.31 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:25:09,282 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.22 vs. limit=15.0 2023-11-26 07:25:46,903 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3282293.3333333335, ans=0.125 2023-11-26 07:25:48,955 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 492350 2023-11-26 07:25:55,294 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.562e+01 8.982e+01 9.660e+01 1.025e+02 3.694e+02, threshold=1.932e+02, percent-clipped=1.0 2023-11-26 07:25:55,321 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 11400, loss[loss=0.07759, simple_loss=0.1085, pruned_loss=0.01414, audio_tagging_loss=0.0092, over 15802.00 frames. ], tot_loss[loss=0.06691, simple_loss=0.09083, pruned_loss=0.01269, audio_tagging_loss=0.00881, over 3047728.20 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:25:56,089 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3282360.0, ans=0.1 2023-11-26 07:26:07,101 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3282426.6666666665, ans=0.125 2023-11-26 07:26:15,217 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.13 vs. limit=15.0 2023-11-26 07:26:18,796 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.13 vs. limit=15.0 2023-11-26 07:26:19,811 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.32 vs. limit=15.0 2023-11-26 07:26:44,827 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 492400 2023-11-26 07:26:51,867 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 11450, loss[loss=0.07414, simple_loss=0.1084, pruned_loss=0.01499, audio_tagging_loss=0.004926, over 14687.00 frames. ], tot_loss[loss=0.06714, simple_loss=0.09143, pruned_loss=0.0127, audio_tagging_loss=0.00872, over 3046819.97 frames. ], batch size: 53, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:27:14,734 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.37 vs. limit=15.0 2023-11-26 07:27:25,422 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3282893.3333333335, ans=0.0 2023-11-26 07:27:33,322 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3282893.3333333335, ans=0.2 2023-11-26 07:27:38,572 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3282960.0, ans=0.1 2023-11-26 07:27:40,493 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 492450 2023-11-26 07:27:47,907 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.451e+01 8.866e+01 9.675e+01 1.039e+02 1.240e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-26 07:27:47,934 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 11500, loss[loss=0.06625, simple_loss=0.08309, pruned_loss=0.01366, audio_tagging_loss=0.01104, over 15498.00 frames. ], tot_loss[loss=0.06685, simple_loss=0.09071, pruned_loss=0.01269, audio_tagging_loss=0.008803, over 3042554.54 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:28:03,031 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3283093.3333333335, ans=0.0 2023-11-26 07:28:03,062 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3283093.3333333335, ans=0.125 2023-11-26 07:28:16,871 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 07:28:19,986 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3283226.6666666665, ans=0.0 2023-11-26 07:28:33,765 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3283293.3333333335, ans=0.125 2023-11-26 07:28:36,730 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 492500 2023-11-26 07:28:42,923 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 11550, loss[loss=0.05802, simple_loss=0.06729, pruned_loss=0.01395, audio_tagging_loss=0.01042, over 14642.00 frames. ], tot_loss[loss=0.06697, simple_loss=0.09112, pruned_loss=0.01269, audio_tagging_loss=0.008728, over 3045120.57 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:28:45,328 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3283360.0, ans=0.0 2023-11-26 07:29:01,414 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.43 vs. limit=22.5 2023-11-26 07:29:03,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3283426.6666666665, ans=0.0 2023-11-26 07:29:14,873 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3283493.3333333335, ans=0.125 2023-11-26 07:29:16,758 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 07:29:20,156 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3283560.0, ans=0.2 2023-11-26 07:29:31,615 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 492550 2023-11-26 07:29:34,931 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.37 vs. limit=15.0 2023-11-26 07:29:38,914 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.351e+01 8.923e+01 9.634e+01 1.014e+02 1.304e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-26 07:29:38,941 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 11600, loss[loss=0.05377, simple_loss=0.07455, pruned_loss=0.007323, audio_tagging_loss=0.009174, over 15772.00 frames. ], tot_loss[loss=0.06775, simple_loss=0.09229, pruned_loss=0.01293, audio_tagging_loss=0.008677, over 3047539.78 frames. ], batch size: 59, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:29:52,432 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3283760.0, ans=0.125 2023-11-26 07:30:27,717 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 492600 2023-11-26 07:30:34,816 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 11650, loss[loss=0.066, simple_loss=0.08561, pruned_loss=0.0113, audio_tagging_loss=0.01189, over 15134.00 frames. ], tot_loss[loss=0.06763, simple_loss=0.09209, pruned_loss=0.01288, audio_tagging_loss=0.008709, over 3043237.73 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:30:38,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3284026.6666666665, ans=0.125 2023-11-26 07:31:07,994 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3284226.6666666665, ans=0.125 2023-11-26 07:31:23,690 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 492650 2023-11-26 07:31:24,269 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.37 vs. limit=12.0 2023-11-26 07:31:25,003 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3284293.3333333335, ans=0.125 2023-11-26 07:31:27,992 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3284293.3333333335, ans=0.1 2023-11-26 07:31:29,937 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.387e+01 8.493e+01 9.108e+01 9.754e+01 1.305e+02, threshold=1.822e+02, percent-clipped=0.0 2023-11-26 07:31:29,964 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 11700, loss[loss=0.06164, simple_loss=0.08267, pruned_loss=0.0107, audio_tagging_loss=0.0096, over 14007.00 frames. ], tot_loss[loss=0.06691, simple_loss=0.09083, pruned_loss=0.01269, audio_tagging_loss=0.008807, over 3044266.08 frames. ], batch size: 53, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:31:38,005 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.06 vs. limit=15.0 2023-11-26 07:31:44,371 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=3284426.6666666665, ans=22.5 2023-11-26 07:31:56,135 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3284493.3333333335, ans=0.0 2023-11-26 07:32:18,459 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 492700 2023-11-26 07:32:24,688 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 11750, loss[loss=0.04803, simple_loss=0.05823, pruned_loss=0.007999, audio_tagging_loss=0.01092, over 14156.00 frames. ], tot_loss[loss=0.0671, simple_loss=0.09099, pruned_loss=0.01283, audio_tagging_loss=0.008776, over 3047409.29 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:32:34,668 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.97 vs. limit=15.0 2023-11-26 07:32:39,144 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3284760.0, ans=0.0 2023-11-26 07:33:03,988 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3284893.3333333335, ans=0.1 2023-11-26 07:33:07,500 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.17 vs. limit=12.0 2023-11-26 07:33:14,327 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 492750 2023-11-26 07:33:21,091 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.269e+01 8.644e+01 9.343e+01 1.016e+02 1.345e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-26 07:33:21,124 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 11800, loss[loss=0.04133, simple_loss=0.05484, pruned_loss=0.003708, audio_tagging_loss=0.0102, over 15144.00 frames. ], tot_loss[loss=0.06725, simple_loss=0.09103, pruned_loss=0.01291, audio_tagging_loss=0.008827, over 3043288.14 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:33:30,326 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3285026.6666666665, ans=0.1 2023-11-26 07:33:31,388 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3285093.3333333335, ans=0.125 2023-11-26 07:33:39,676 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3285093.3333333335, ans=0.2 2023-11-26 07:34:08,163 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3285293.3333333335, ans=0.125 2023-11-26 07:34:10,110 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 492800 2023-11-26 07:34:16,631 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 11850, loss[loss=0.07477, simple_loss=0.09982, pruned_loss=0.01656, audio_tagging_loss=0.008295, over 15076.00 frames. ], tot_loss[loss=0.06731, simple_loss=0.09084, pruned_loss=0.01296, audio_tagging_loss=0.008931, over 3048915.27 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 07:34:36,873 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.06 vs. limit=10.0 2023-11-26 07:34:37,483 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3285493.3333333335, ans=0.05 2023-11-26 07:34:52,743 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3285560.0, ans=0.125 2023-11-26 07:35:05,291 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 492850 2023-11-26 07:35:10,776 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3285693.3333333335, ans=0.0 2023-11-26 07:35:11,678 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 11900, loss[loss=0.06061, simple_loss=0.0758, pruned_loss=0.01179, audio_tagging_loss=0.01092, over 15192.00 frames. ], tot_loss[loss=0.06725, simple_loss=0.09082, pruned_loss=0.01287, audio_tagging_loss=0.008967, over 3042791.10 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 07:35:12,701 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.488e+01 8.644e+01 9.176e+01 9.875e+01 1.365e+02, threshold=1.835e+02, percent-clipped=0.0 2023-11-26 07:35:13,203 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.97 vs. limit=15.0 2023-11-26 07:35:15,558 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.91 vs. limit=15.0 2023-11-26 07:35:18,838 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3285693.3333333335, ans=0.125 2023-11-26 07:35:44,289 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3285893.3333333335, ans=0.0 2023-11-26 07:35:48,497 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3285893.3333333335, ans=0.2 2023-11-26 07:36:00,454 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 492900 2023-11-26 07:36:07,289 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 11950, loss[loss=0.06776, simple_loss=0.09336, pruned_loss=0.01463, audio_tagging_loss=0.006462, over 14363.00 frames. ], tot_loss[loss=0.06715, simple_loss=0.09041, pruned_loss=0.01285, audio_tagging_loss=0.009098, over 3035148.58 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 07:36:08,173 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3286026.6666666665, ans=0.04949747468305833 2023-11-26 07:36:15,791 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3286026.6666666665, ans=0.125 2023-11-26 07:36:19,174 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3286093.3333333335, ans=0.0 2023-11-26 07:36:27,580 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3286093.3333333335, ans=0.2 2023-11-26 07:36:54,465 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 492950 2023-11-26 07:37:00,469 INFO [train_asr.py:1235] (0/4) Epoch 41, batch 12000, loss[loss=0.0875, simple_loss=0.135, pruned_loss=0.01473, audio_tagging_loss=0.005249, over 15888.00 frames. ], tot_loss[loss=0.0668, simple_loss=0.09014, pruned_loss=0.01264, audio_tagging_loss=0.009083, over 3036009.66 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 07:37:00,471 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-26 07:37:18,125 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([6.0048, 5.8503, 5.6402, 5.5287], device='cuda:0') 2023-11-26 07:37:33,030 INFO [train_asr.py:1267] (0/4) Epoch 41, validation: loss=0.05803, simple_loss=0.05068, pruned_loss=0.005323, audio_tagging_loss=0.02736, over 4681554.00 frames. 2023-11-26 07:37:33,031 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-26 07:37:35,014 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.239e+01 8.785e+01 9.392e+01 1.025e+02 1.388e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-26 07:37:39,362 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3286360.0, ans=0.125 2023-11-26 07:37:51,565 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 07:37:51,645 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3286426.6666666665, ans=0.0 2023-11-26 07:37:57,560 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/epoch-41.pt 2023-11-26 07:38:28,137 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 0, loss[loss=0.073, simple_loss=0.08671, pruned_loss=0.0114, audio_tagging_loss=0.01824, over 15611.00 frames. ], tot_loss[loss=0.073, simple_loss=0.08671, pruned_loss=0.0114, audio_tagging_loss=0.01824, over 15611.00 frames. ], batch size: 58, lr: 1.62e-03, grad_scale: 32.0 2023-11-26 07:38:28,139 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-26 07:38:41,692 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.5782, 2.5407, 3.3887, 2.7264], device='cuda:0') 2023-11-26 07:38:59,442 INFO [train_asr.py:1267] (0/4) Epoch 42, validation: loss=0.05791, simple_loss=0.05064, pruned_loss=0.005256, audio_tagging_loss=0.02733, over 4681554.00 frames. 2023-11-26 07:38:59,443 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-26 07:39:00,740 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3286513.3333333335, ans=0.0 2023-11-26 07:39:01,814 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3286513.3333333335, ans=0.0 2023-11-26 07:39:02,101 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.21 vs. limit=10.0 2023-11-26 07:39:08,724 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3286513.3333333335, ans=0.125 2023-11-26 07:39:08,767 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3286513.3333333335, ans=0.125 2023-11-26 07:39:17,303 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3286580.0, ans=0.125 2023-11-26 07:39:19,606 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.69 vs. limit=22.5 2023-11-26 07:39:23,561 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 493000 2023-11-26 07:39:39,551 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3286713.3333333335, ans=0.125 2023-11-26 07:39:54,175 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3286780.0, ans=0.125 2023-11-26 07:39:56,098 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 50, loss[loss=0.08894, simple_loss=0.1117, pruned_loss=0.01909, audio_tagging_loss=0.01401, over 16420.00 frames. ], tot_loss[loss=0.07681, simple_loss=0.09375, pruned_loss=0.01325, audio_tagging_loss=0.01669, over 696830.14 frames. ], batch size: 57, lr: 1.62e-03, grad_scale: 32.0 2023-11-26 07:40:12,702 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.95 vs. limit=15.0 2023-11-26 07:40:17,900 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3286980.0, ans=0.07 2023-11-26 07:40:19,851 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 493050 2023-11-26 07:40:23,722 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3286980.0, ans=0.125 2023-11-26 07:40:29,378 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.809e+01 9.630e+01 1.022e+02 1.088e+02 1.448e+02, threshold=2.045e+02, percent-clipped=0.0 2023-11-26 07:40:45,847 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3287113.3333333335, ans=0.0 2023-11-26 07:40:46,262 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.47 vs. limit=15.0 2023-11-26 07:40:52,962 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 100, loss[loss=0.04757, simple_loss=0.04733, pruned_loss=0.007358, audio_tagging_loss=0.01655, over 14593.00 frames. ], tot_loss[loss=0.07473, simple_loss=0.09239, pruned_loss=0.01254, audio_tagging_loss=0.01599, over 1213454.56 frames. ], batch size: 58, lr: 1.62e-03, grad_scale: 32.0 2023-11-26 07:41:06,289 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 07:41:08,829 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.21 vs. limit=15.0 2023-11-26 07:41:09,547 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3287246.6666666665, ans=0.2 2023-11-26 07:41:15,011 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.57 vs. limit=22.5 2023-11-26 07:41:16,409 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 493100 2023-11-26 07:41:23,431 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.13 vs. limit=15.0 2023-11-26 07:41:32,741 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3287380.0, ans=0.125 2023-11-26 07:41:34,731 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3287380.0, ans=0.125 2023-11-26 07:41:44,374 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3287446.6666666665, ans=0.2 2023-11-26 07:41:48,979 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 150, loss[loss=0.05121, simple_loss=0.06726, pruned_loss=0.006582, audio_tagging_loss=0.011, over 15367.00 frames. ], tot_loss[loss=0.07308, simple_loss=0.09234, pruned_loss=0.01252, audio_tagging_loss=0.01439, over 1621747.64 frames. ], batch size: 59, lr: 1.62e-03, grad_scale: 32.0 2023-11-26 07:41:49,219 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3287513.3333333335, ans=0.0 2023-11-26 07:41:52,460 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3287513.3333333335, ans=0.125 2023-11-26 07:42:13,184 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 493150 2023-11-26 07:42:13,364 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3287646.6666666665, ans=0.2 2023-11-26 07:42:21,489 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.833e+01 9.081e+01 9.641e+01 1.033e+02 1.343e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-26 07:42:26,462 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.15 vs. limit=22.5 2023-11-26 07:42:44,912 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 200, loss[loss=0.04749, simple_loss=0.06431, pruned_loss=0.006997, audio_tagging_loss=0.008344, over 14945.00 frames. ], tot_loss[loss=0.07199, simple_loss=0.09274, pruned_loss=0.01291, audio_tagging_loss=0.01271, over 1936377.80 frames. ], batch size: 56, lr: 1.62e-03, grad_scale: 32.0 2023-11-26 07:43:08,501 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 493200 2023-11-26 07:43:25,003 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3288046.6666666665, ans=0.025 2023-11-26 07:43:30,352 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3288113.3333333335, ans=0.0 2023-11-26 07:43:40,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3288113.3333333335, ans=0.125 2023-11-26 07:43:41,904 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 250, loss[loss=0.0588, simple_loss=0.0663, pruned_loss=0.01285, audio_tagging_loss=0.0128, over 13765.00 frames. ], tot_loss[loss=0.07043, simple_loss=0.09177, pruned_loss=0.01295, audio_tagging_loss=0.01159, over 2182897.30 frames. ], batch size: 53, lr: 1.62e-03, grad_scale: 32.0 2023-11-26 07:43:42,359 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.44 vs. limit=12.0 2023-11-26 07:43:43,296 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3288180.0, ans=0.0 2023-11-26 07:43:48,571 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3288180.0, ans=0.1 2023-11-26 07:44:00,810 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3288246.6666666665, ans=0.0 2023-11-26 07:44:01,842 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3288246.6666666665, ans=0.0 2023-11-26 07:44:05,389 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 493250 2023-11-26 07:44:14,641 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.995e+01 8.741e+01 9.429e+01 1.027e+02 1.277e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-26 07:44:22,209 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3288380.0, ans=0.125 2023-11-26 07:44:37,412 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 300, loss[loss=0.04144, simple_loss=0.0559, pruned_loss=0.006266, audio_tagging_loss=0.007223, over 15007.00 frames. ], tot_loss[loss=0.06996, simple_loss=0.09263, pruned_loss=0.01302, audio_tagging_loss=0.01063, over 2387353.92 frames. ], batch size: 57, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 07:44:52,371 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.09 vs. limit=15.0 2023-11-26 07:44:59,252 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.48 vs. limit=6.0 2023-11-26 07:45:00,696 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 493300 2023-11-26 07:45:12,009 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3288713.3333333335, ans=0.0 2023-11-26 07:45:22,692 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.85 vs. limit=15.0 2023-11-26 07:45:26,498 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-26 07:45:31,164 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3288780.0, ans=0.09899494936611666 2023-11-26 07:45:33,046 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 350, loss[loss=0.06849, simple_loss=0.08665, pruned_loss=0.01374, audio_tagging_loss=0.01143, over 14841.00 frames. ], tot_loss[loss=0.06908, simple_loss=0.09222, pruned_loss=0.01281, audio_tagging_loss=0.01016, over 2538892.90 frames. ], batch size: 54, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 07:45:37,465 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3288846.6666666665, ans=0.0 2023-11-26 07:45:56,626 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 493350 2023-11-26 07:46:06,686 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.530e+01 8.599e+01 9.325e+01 1.001e+02 1.376e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-26 07:46:17,938 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3289113.3333333335, ans=0.125 2023-11-26 07:46:17,987 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3289113.3333333335, ans=0.125 2023-11-26 07:46:23,814 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3289113.3333333335, ans=0.1 2023-11-26 07:46:29,004 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 400, loss[loss=0.07487, simple_loss=0.1038, pruned_loss=0.01299, audio_tagging_loss=0.009972, over 16629.00 frames. ], tot_loss[loss=0.06852, simple_loss=0.09169, pruned_loss=0.01284, audio_tagging_loss=0.009828, over 2647970.76 frames. ], batch size: 61, lr: 1.62e-03, grad_scale: 32.0 2023-11-26 07:46:37,125 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 07:46:52,928 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 493400 2023-11-26 07:46:53,190 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3289313.3333333335, ans=0.2 2023-11-26 07:46:59,063 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.30 vs. limit=22.5 2023-11-26 07:47:00,720 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3289313.3333333335, ans=0.125 2023-11-26 07:47:07,607 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3289380.0, ans=0.125 2023-11-26 07:47:11,945 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3289380.0, ans=0.125 2023-11-26 07:47:13,937 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3289446.6666666665, ans=0.0 2023-11-26 07:47:15,751 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3289446.6666666665, ans=0.125 2023-11-26 07:47:18,814 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3289446.6666666665, ans=0.0 2023-11-26 07:47:21,029 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3289446.6666666665, ans=0.125 2023-11-26 07:47:21,285 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.56 vs. limit=15.0 2023-11-26 07:47:22,105 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3289446.6666666665, ans=0.0 2023-11-26 07:47:25,004 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 450, loss[loss=0.04542, simple_loss=0.05232, pruned_loss=0.007704, audio_tagging_loss=0.01156, over 14711.00 frames. ], tot_loss[loss=0.06797, simple_loss=0.09149, pruned_loss=0.01272, audio_tagging_loss=0.009499, over 2735290.55 frames. ], batch size: 59, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 07:47:29,879 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3289513.3333333335, ans=0.125 2023-11-26 07:47:41,113 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3289580.0, ans=0.1 2023-11-26 07:47:48,969 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 493450 2023-11-26 07:47:58,738 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3289713.3333333335, ans=0.1 2023-11-26 07:47:59,456 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.849e+01 8.870e+01 9.366e+01 1.009e+02 1.216e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-26 07:48:12,622 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3289780.0, ans=0.1 2023-11-26 07:48:14,890 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.02 vs. limit=15.0 2023-11-26 07:48:21,395 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 500, loss[loss=0.07555, simple_loss=0.1138, pruned_loss=0.01056, audio_tagging_loss=0.008083, over 16343.00 frames. ], tot_loss[loss=0.0673, simple_loss=0.09074, pruned_loss=0.01263, audio_tagging_loss=0.009297, over 2800646.35 frames. ], batch size: 59, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 07:48:22,748 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3289846.6666666665, ans=0.2 2023-11-26 07:48:33,666 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3289913.3333333335, ans=0.125 2023-11-26 07:48:34,748 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3289913.3333333335, ans=0.125 2023-11-26 07:48:35,985 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3289913.3333333335, ans=0.1 2023-11-26 07:48:40,728 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.44 vs. limit=15.0 2023-11-26 07:48:44,915 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 493500 2023-11-26 07:48:51,556 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3289980.0, ans=0.125 2023-11-26 07:48:56,179 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3290046.6666666665, ans=0.0 2023-11-26 07:49:17,261 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 550, loss[loss=0.063, simple_loss=0.08491, pruned_loss=0.0126, audio_tagging_loss=0.007945, over 15464.00 frames. ], tot_loss[loss=0.06721, simple_loss=0.09091, pruned_loss=0.01268, audio_tagging_loss=0.009078, over 2857416.45 frames. ], batch size: 57, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 07:49:34,087 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3290246.6666666665, ans=0.125 2023-11-26 07:49:40,758 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 493550 2023-11-26 07:49:51,813 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.224e+01 8.897e+01 9.489e+01 1.022e+02 1.296e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-26 07:49:57,940 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3290380.0, ans=0.5 2023-11-26 07:50:13,172 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 600, loss[loss=0.07979, simple_loss=0.1153, pruned_loss=0.01653, audio_tagging_loss=0.005632, over 15153.00 frames. ], tot_loss[loss=0.06742, simple_loss=0.09144, pruned_loss=0.01265, audio_tagging_loss=0.009044, over 2900830.57 frames. ], batch size: 55, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 07:50:20,414 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3290513.3333333335, ans=0.1 2023-11-26 07:50:24,595 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3290580.0, ans=0.125 2023-11-26 07:50:36,748 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 493600 2023-11-26 07:50:36,817 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3290646.6666666665, ans=0.2 2023-11-26 07:50:41,967 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 07:50:52,590 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3290713.3333333335, ans=0.0 2023-11-26 07:51:06,274 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3290780.0, ans=0.125 2023-11-26 07:51:09,747 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 650, loss[loss=0.06757, simple_loss=0.07953, pruned_loss=0.01584, audio_tagging_loss=0.01197, over 15208.00 frames. ], tot_loss[loss=0.06673, simple_loss=0.0902, pruned_loss=0.01249, audio_tagging_loss=0.00914, over 2928627.50 frames. ], batch size: 58, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 07:51:12,102 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3290846.6666666665, ans=0.0 2023-11-26 07:51:18,397 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3290846.6666666665, ans=0.0 2023-11-26 07:51:32,503 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 493650 2023-11-26 07:51:38,055 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3290980.0, ans=0.0 2023-11-26 07:51:44,419 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.211e+01 8.602e+01 9.245e+01 1.014e+02 1.320e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-26 07:52:05,559 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 700, loss[loss=0.04458, simple_loss=0.04952, pruned_loss=0.005178, audio_tagging_loss=0.01465, over 14203.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.08936, pruned_loss=0.01236, audio_tagging_loss=0.00918, over 2955954.45 frames. ], batch size: 57, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 07:52:05,751 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3291180.0, ans=0.0 2023-11-26 07:52:20,221 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.93 vs. limit=15.0 2023-11-26 07:52:29,160 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 493700 2023-11-26 07:52:35,034 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3291313.3333333335, ans=0.125 2023-11-26 07:52:44,681 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3291380.0, ans=0.125 2023-11-26 07:52:52,810 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3291446.6666666665, ans=0.125 2023-11-26 07:52:54,957 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3291446.6666666665, ans=0.125 2023-11-26 07:52:58,093 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3291446.6666666665, ans=0.0 2023-11-26 07:53:01,093 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 750, loss[loss=0.07966, simple_loss=0.1161, pruned_loss=0.01239, audio_tagging_loss=0.009204, over 15753.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.08966, pruned_loss=0.01224, audio_tagging_loss=0.009057, over 2980736.01 frames. ], batch size: 58, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 07:53:18,997 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3291580.0, ans=0.07 2023-11-26 07:53:25,243 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 493750 2023-11-26 07:53:34,349 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3291713.3333333335, ans=0.125 2023-11-26 07:53:36,199 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.399e+01 8.579e+01 9.292e+01 9.836e+01 1.327e+02, threshold=1.858e+02, percent-clipped=0.0 2023-11-26 07:53:42,691 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3291713.3333333335, ans=0.125 2023-11-26 07:53:58,241 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 800, loss[loss=0.08031, simple_loss=0.1078, pruned_loss=0.01859, audio_tagging_loss=0.007807, over 15511.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.08988, pruned_loss=0.01228, audio_tagging_loss=0.009064, over 2999552.73 frames. ], batch size: 59, lr: 1.62e-03, grad_scale: 32.0 2023-11-26 07:53:59,480 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3291846.6666666665, ans=0.125 2023-11-26 07:54:06,831 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3291846.6666666665, ans=0.2 2023-11-26 07:54:10,469 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.19 vs. limit=15.0 2023-11-26 07:54:21,042 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 493800 2023-11-26 07:54:27,968 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3291980.0, ans=0.0 2023-11-26 07:54:38,147 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3292046.6666666665, ans=0.0 2023-11-26 07:54:44,203 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3292113.3333333335, ans=0.0 2023-11-26 07:54:52,577 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3292180.0, ans=0.0 2023-11-26 07:54:53,900 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 850, loss[loss=0.07186, simple_loss=0.1092, pruned_loss=0.0105, audio_tagging_loss=0.006784, over 15351.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.08992, pruned_loss=0.01241, audio_tagging_loss=0.009154, over 3008443.68 frames. ], batch size: 57, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 07:55:03,032 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.11 vs. limit=15.0 2023-11-26 07:55:06,707 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3292246.6666666665, ans=0.2 2023-11-26 07:55:16,667 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 493850 2023-11-26 07:55:29,200 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.705e+01 8.678e+01 9.372e+01 1.019e+02 1.445e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-26 07:55:31,549 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3292380.0, ans=0.125 2023-11-26 07:55:48,984 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 900, loss[loss=0.05587, simple_loss=0.07692, pruned_loss=0.009028, audio_tagging_loss=0.008383, over 14848.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.08923, pruned_loss=0.01237, audio_tagging_loss=0.009153, over 3015470.58 frames. ], batch size: 55, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 07:56:08,998 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3292580.0, ans=0.0 2023-11-26 07:56:13,198 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 493900 2023-11-26 07:56:16,797 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=3292646.6666666665, ans=15.0 2023-11-26 07:56:31,018 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3292713.3333333335, ans=0.0 2023-11-26 07:56:45,015 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 950, loss[loss=0.06575, simple_loss=0.08561, pruned_loss=0.01325, audio_tagging_loss=0.009695, over 14033.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.09008, pruned_loss=0.01255, audio_tagging_loss=0.008923, over 3022495.97 frames. ], batch size: 53, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 07:56:51,796 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3292846.6666666665, ans=0.125 2023-11-26 07:56:57,283 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3292913.3333333335, ans=0.0 2023-11-26 07:56:57,674 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.06 vs. limit=22.5 2023-11-26 07:57:03,472 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.35 vs. limit=15.0 2023-11-26 07:57:09,226 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 493950 2023-11-26 07:57:10,533 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3292980.0, ans=0.125 2023-11-26 07:57:11,731 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=3292980.0, ans=15.0 2023-11-26 07:57:20,930 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.052e+01 8.738e+01 9.325e+01 9.888e+01 1.254e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-26 07:57:25,016 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3293046.6666666665, ans=0.125 2023-11-26 07:57:41,270 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 1000, loss[loss=0.05849, simple_loss=0.08624, pruned_loss=0.006885, audio_tagging_loss=0.008481, over 14897.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.09009, pruned_loss=0.01253, audio_tagging_loss=0.008795, over 3024683.39 frames. ], batch size: 57, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 07:57:41,506 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3293180.0, ans=0.0 2023-11-26 07:57:45,188 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3293180.0, ans=0.0 2023-11-26 07:57:46,342 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3293180.0, ans=0.125 2023-11-26 07:58:04,306 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 07:58:04,343 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 494000 2023-11-26 07:58:13,106 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3293313.3333333335, ans=0.07 2023-11-26 07:58:23,334 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3293380.0, ans=0.125 2023-11-26 07:58:29,563 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.11 vs. limit=12.0 2023-11-26 07:58:37,609 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 1050, loss[loss=0.07099, simple_loss=0.1024, pruned_loss=0.01216, audio_tagging_loss=0.007639, over 15729.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.08996, pruned_loss=0.01257, audio_tagging_loss=0.008763, over 3028919.66 frames. ], batch size: 56, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 07:58:38,814 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3293513.3333333335, ans=0.2 2023-11-26 07:58:50,024 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3293580.0, ans=0.0 2023-11-26 07:58:56,162 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.89 vs. limit=15.0 2023-11-26 07:59:01,678 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 494050 2023-11-26 07:59:01,829 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3293646.6666666665, ans=0.0 2023-11-26 07:59:10,964 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3293713.3333333335, ans=0.125 2023-11-26 07:59:13,866 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.295e+01 8.710e+01 9.431e+01 1.020e+02 1.408e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-26 07:59:33,781 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 1100, loss[loss=0.05492, simple_loss=0.07817, pruned_loss=0.007819, audio_tagging_loss=0.008012, over 15113.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08897, pruned_loss=0.01238, audio_tagging_loss=0.008803, over 3024242.63 frames. ], batch size: 59, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 07:59:34,043 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3293846.6666666665, ans=0.0 2023-11-26 07:59:34,446 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.41 vs. limit=22.5 2023-11-26 07:59:36,063 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 07:59:40,145 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3293846.6666666665, ans=0.125 2023-11-26 07:59:58,042 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 494100 2023-11-26 08:00:04,460 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3293980.0, ans=0.025 2023-11-26 08:00:30,393 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 1150, loss[loss=0.05309, simple_loss=0.06583, pruned_loss=0.01025, audio_tagging_loss=0.00992, over 16225.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.08894, pruned_loss=0.01225, audio_tagging_loss=0.00889, over 3033556.76 frames. ], batch size: 62, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 08:00:37,477 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 08:00:40,865 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3294246.6666666665, ans=0.125 2023-11-26 08:00:47,237 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.39 vs. limit=12.0 2023-11-26 08:00:53,170 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 494150 2023-11-26 08:01:01,403 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3294313.3333333335, ans=0.0 2023-11-26 08:01:05,960 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.933e+01 8.571e+01 9.145e+01 9.893e+01 1.532e+02, threshold=1.829e+02, percent-clipped=0.0 2023-11-26 08:01:06,129 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=3294380.0, ans=0.05 2023-11-26 08:01:25,463 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3294513.3333333335, ans=0.125 2023-11-26 08:01:26,212 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 1200, loss[loss=0.07563, simple_loss=0.1035, pruned_loss=0.01546, audio_tagging_loss=0.008429, over 14226.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08847, pruned_loss=0.01222, audio_tagging_loss=0.008859, over 3030934.17 frames. ], batch size: 54, lr: 1.62e-03, grad_scale: 32.0 2023-11-26 08:01:27,597 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3294513.3333333335, ans=0.125 2023-11-26 08:01:30,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3294513.3333333335, ans=0.0 2023-11-26 08:01:33,843 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3294513.3333333335, ans=0.07 2023-11-26 08:01:37,300 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.98 vs. limit=6.0 2023-11-26 08:01:46,071 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 08:01:48,737 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3294646.6666666665, ans=0.0 2023-11-26 08:01:49,750 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 494200 2023-11-26 08:02:02,860 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3294713.3333333335, ans=0.0 2023-11-26 08:02:11,447 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3294780.0, ans=0.125 2023-11-26 08:02:22,017 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 1250, loss[loss=0.05552, simple_loss=0.06994, pruned_loss=0.009995, audio_tagging_loss=0.01055, over 15562.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08872, pruned_loss=0.01208, audio_tagging_loss=0.00891, over 3034052.39 frames. ], batch size: 60, lr: 1.62e-03, grad_scale: 32.0 2023-11-26 08:02:41,088 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3294913.3333333335, ans=0.2 2023-11-26 08:02:46,670 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 494250 2023-11-26 08:02:49,025 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3294980.0, ans=0.0 2023-11-26 08:02:58,374 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.703e+01 8.635e+01 9.244e+01 9.927e+01 1.336e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-26 08:03:02,953 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3295046.6666666665, ans=0.0 2023-11-26 08:03:09,788 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3295113.3333333335, ans=0.0 2023-11-26 08:03:10,841 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3295113.3333333335, ans=0.05 2023-11-26 08:03:18,631 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 1300, loss[loss=0.04962, simple_loss=0.0706, pruned_loss=0.007266, audio_tagging_loss=0.007053, over 15860.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.08939, pruned_loss=0.0122, audio_tagging_loss=0.008817, over 3036187.98 frames. ], batch size: 59, lr: 1.62e-03, grad_scale: 32.0 2023-11-26 08:03:22,703 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3295180.0, ans=0.125 2023-11-26 08:03:22,946 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.40 vs. limit=15.0 2023-11-26 08:03:24,816 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3295180.0, ans=0.0 2023-11-26 08:03:29,473 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3295246.6666666665, ans=0.125 2023-11-26 08:03:42,029 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 494300 2023-11-26 08:04:09,013 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.10 vs. limit=15.0 2023-11-26 08:04:14,948 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 1350, loss[loss=0.06693, simple_loss=0.09156, pruned_loss=0.01106, audio_tagging_loss=0.01009, over 14918.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.08987, pruned_loss=0.01204, audio_tagging_loss=0.00887, over 3042968.91 frames. ], batch size: 56, lr: 1.62e-03, grad_scale: 32.0 2023-11-26 08:04:28,158 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 08:04:38,705 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 494350 2023-11-26 08:04:39,986 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3295646.6666666665, ans=0.0 2023-11-26 08:04:40,946 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3295646.6666666665, ans=0.1 2023-11-26 08:04:51,784 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3295713.3333333335, ans=0.125 2023-11-26 08:04:52,464 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.588e+01 8.816e+01 9.406e+01 1.018e+02 1.240e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-26 08:04:55,806 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 08:04:55,968 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 08:05:00,253 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3295780.0, ans=0.0 2023-11-26 08:05:10,829 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 1400, loss[loss=0.07503, simple_loss=0.09143, pruned_loss=0.0172, audio_tagging_loss=0.01211, over 13887.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.08947, pruned_loss=0.01204, audio_tagging_loss=0.008931, over 3040249.01 frames. ], batch size: 53, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 08:05:21,886 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3295913.3333333335, ans=0.125 2023-11-26 08:05:34,863 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 494400 2023-11-26 08:06:06,883 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3296180.0, ans=0.2 2023-11-26 08:06:07,657 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 1450, loss[loss=0.06902, simple_loss=0.09328, pruned_loss=0.0139, audio_tagging_loss=0.008478, over 14949.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.09049, pruned_loss=0.01238, audio_tagging_loss=0.008922, over 3039631.99 frames. ], batch size: 57, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 08:06:14,208 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3296180.0, ans=0.125 2023-11-26 08:06:30,403 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.87 vs. limit=15.0 2023-11-26 08:06:31,013 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 494450 2023-11-26 08:06:38,919 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.27 vs. limit=6.0 2023-11-26 08:06:40,276 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3296380.0, ans=0.0 2023-11-26 08:06:44,221 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.307e+01 8.868e+01 9.341e+01 9.992e+01 1.188e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-26 08:06:44,878 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.77 vs. limit=15.0 2023-11-26 08:07:04,080 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 1500, loss[loss=0.07454, simple_loss=0.09575, pruned_loss=0.0185, audio_tagging_loss=0.008171, over 15248.00 frames. ], tot_loss[loss=0.0674, simple_loss=0.09154, pruned_loss=0.01267, audio_tagging_loss=0.008961, over 3050339.83 frames. ], batch size: 58, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 08:07:08,512 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3296513.3333333335, ans=0.2 2023-11-26 08:07:10,623 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3296513.3333333335, ans=0.0 2023-11-26 08:07:18,182 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3296580.0, ans=0.125 2023-11-26 08:07:27,041 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 494500 2023-11-26 08:07:45,355 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3296713.3333333335, ans=0.125 2023-11-26 08:07:59,531 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 1550, loss[loss=0.06578, simple_loss=0.08601, pruned_loss=0.01466, audio_tagging_loss=0.008116, over 14807.00 frames. ], tot_loss[loss=0.06731, simple_loss=0.09123, pruned_loss=0.01269, audio_tagging_loss=0.009011, over 3042913.58 frames. ], batch size: 55, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 08:08:07,783 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3296846.6666666665, ans=0.125 2023-11-26 08:08:08,794 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3296846.6666666665, ans=0.07 2023-11-26 08:08:19,864 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3296913.3333333335, ans=0.125 2023-11-26 08:08:20,774 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3296980.0, ans=0.125 2023-11-26 08:08:22,803 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 494550 2023-11-26 08:08:34,609 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3297046.6666666665, ans=0.125 2023-11-26 08:08:34,667 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3297046.6666666665, ans=0.125 2023-11-26 08:08:36,563 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.682e+01 8.636e+01 9.426e+01 1.014e+02 1.319e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-26 08:08:42,469 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.60 vs. limit=15.0 2023-11-26 08:08:51,205 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3297113.3333333335, ans=0.125 2023-11-26 08:08:55,624 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 1600, loss[loss=0.05685, simple_loss=0.07896, pruned_loss=0.009904, audio_tagging_loss=0.007468, over 15495.00 frames. ], tot_loss[loss=0.06724, simple_loss=0.09101, pruned_loss=0.01273, audio_tagging_loss=0.00901, over 3045230.12 frames. ], batch size: 60, lr: 1.62e-03, grad_scale: 32.0 2023-11-26 08:09:18,994 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 494600 2023-11-26 08:09:45,382 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=3297446.6666666665, ans=15.0 2023-11-26 08:09:51,641 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 1650, loss[loss=0.07539, simple_loss=0.09944, pruned_loss=0.01649, audio_tagging_loss=0.00919, over 15757.00 frames. ], tot_loss[loss=0.0669, simple_loss=0.09052, pruned_loss=0.01263, audio_tagging_loss=0.009012, over 3043906.83 frames. ], batch size: 58, lr: 1.62e-03, grad_scale: 32.0 2023-11-26 08:09:57,523 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.39 vs. limit=6.0 2023-11-26 08:10:05,138 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3297580.0, ans=0.1 2023-11-26 08:10:15,088 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 494650 2023-11-26 08:10:28,308 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.653e+01 8.801e+01 9.353e+01 1.001e+02 1.567e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-26 08:10:47,516 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 1700, loss[loss=0.07025, simple_loss=0.09348, pruned_loss=0.01259, audio_tagging_loss=0.01092, over 14542.00 frames. ], tot_loss[loss=0.06691, simple_loss=0.09058, pruned_loss=0.01259, audio_tagging_loss=0.009041, over 3052004.43 frames. ], batch size: 54, lr: 1.62e-03, grad_scale: 32.0 2023-11-26 08:10:49,821 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3297846.6666666665, ans=0.2 2023-11-26 08:10:53,096 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3297846.6666666665, ans=0.2 2023-11-26 08:10:59,927 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3297913.3333333335, ans=0.1 2023-11-26 08:11:10,036 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3297980.0, ans=0.0 2023-11-26 08:11:10,993 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 494700 2023-11-26 08:11:19,005 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3297980.0, ans=0.1 2023-11-26 08:11:23,284 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3298046.6666666665, ans=0.1 2023-11-26 08:11:29,055 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3298046.6666666665, ans=0.125 2023-11-26 08:11:31,539 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.59 vs. limit=22.5 2023-11-26 08:11:42,339 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3298180.0, ans=0.0 2023-11-26 08:11:43,206 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 1750, loss[loss=0.08257, simple_loss=0.1217, pruned_loss=0.0161, audio_tagging_loss=0.005635, over 15995.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.0899, pruned_loss=0.01256, audio_tagging_loss=0.009051, over 3061268.24 frames. ], batch size: 59, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 08:11:49,260 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 08:11:52,453 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=3298180.0, ans=0.05 2023-11-26 08:11:55,508 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3298246.6666666665, ans=0.125 2023-11-26 08:11:55,592 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3298246.6666666665, ans=0.0 2023-11-26 08:11:59,569 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.01 vs. limit=10.0 2023-11-26 08:12:04,904 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=3298313.3333333335, ans=15.0 2023-11-26 08:12:06,566 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 494750 2023-11-26 08:12:12,592 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3298313.3333333335, ans=0.2 2023-11-26 08:12:21,373 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.536e+01 8.900e+01 9.496e+01 1.021e+02 1.422e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-26 08:12:23,784 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3298380.0, ans=0.1 2023-11-26 08:12:32,787 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3298446.6666666665, ans=0.2 2023-11-26 08:12:33,989 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3298446.6666666665, ans=0.125 2023-11-26 08:12:38,152 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3298513.3333333335, ans=0.1 2023-11-26 08:12:39,607 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 1800, loss[loss=0.05729, simple_loss=0.06655, pruned_loss=0.01438, audio_tagging_loss=0.009632, over 14772.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.08968, pruned_loss=0.01253, audio_tagging_loss=0.009015, over 3055653.40 frames. ], batch size: 56, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 08:12:39,838 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3298513.3333333335, ans=0.2 2023-11-26 08:12:46,127 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3298513.3333333335, ans=0.0 2023-11-26 08:12:49,352 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3298580.0, ans=0.125 2023-11-26 08:12:50,447 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=3298580.0, ans=0.95 2023-11-26 08:12:50,451 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3298580.0, ans=0.1 2023-11-26 08:13:03,006 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 494800 2023-11-26 08:13:04,508 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3298646.6666666665, ans=0.125 2023-11-26 08:13:35,228 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 1850, loss[loss=0.06841, simple_loss=0.0922, pruned_loss=0.01331, audio_tagging_loss=0.009004, over 15375.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.09004, pruned_loss=0.01251, audio_tagging_loss=0.008949, over 3059149.37 frames. ], batch size: 57, lr: 1.62e-03, grad_scale: 8.0 2023-11-26 08:13:35,372 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3298846.6666666665, ans=0.2 2023-11-26 08:13:41,326 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3298846.6666666665, ans=0.125 2023-11-26 08:13:44,318 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3298846.6666666665, ans=0.125 2023-11-26 08:13:58,513 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3298980.0, ans=0.2 2023-11-26 08:13:59,430 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 494850 2023-11-26 08:14:03,841 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3298980.0, ans=0.125 2023-11-26 08:14:14,832 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.613e+01 8.832e+01 9.434e+01 1.017e+02 1.223e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-26 08:14:17,145 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3299046.6666666665, ans=0.125 2023-11-26 08:14:31,426 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.72 vs. limit=15.0 2023-11-26 08:14:31,937 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 1900, loss[loss=0.08403, simple_loss=0.1145, pruned_loss=0.0188, audio_tagging_loss=0.00797, over 15462.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.08986, pruned_loss=0.01242, audio_tagging_loss=0.008875, over 3061579.64 frames. ], batch size: 58, lr: 1.62e-03, grad_scale: 8.0 2023-11-26 08:14:35,260 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3299180.0, ans=0.0 2023-11-26 08:14:40,257 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.29 vs. limit=12.0 2023-11-26 08:14:40,373 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.84 vs. limit=6.0 2023-11-26 08:14:43,180 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3299246.6666666665, ans=0.125 2023-11-26 08:14:51,374 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.95 vs. limit=15.0 2023-11-26 08:14:55,258 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 494900 2023-11-26 08:15:07,497 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3299380.0, ans=0.09899494936611666 2023-11-26 08:15:07,522 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3299380.0, ans=0.125 2023-11-26 08:15:27,392 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 1950, loss[loss=0.06316, simple_loss=0.08928, pruned_loss=0.01192, audio_tagging_loss=0.0066, over 15524.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.0898, pruned_loss=0.01248, audio_tagging_loss=0.008836, over 3058168.77 frames. ], batch size: 59, lr: 1.62e-03, grad_scale: 8.0 2023-11-26 08:15:51,030 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 494950 2023-11-26 08:16:01,802 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3299713.3333333335, ans=0.125 2023-11-26 08:16:06,893 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.282e+01 8.592e+01 9.452e+01 9.958e+01 1.219e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-26 08:16:08,231 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3299713.3333333335, ans=0.1 2023-11-26 08:16:09,261 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3299713.3333333335, ans=0.2 2023-11-26 08:16:14,066 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 08:16:17,294 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3299780.0, ans=0.0 2023-11-26 08:16:23,485 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 2000, loss[loss=0.07608, simple_loss=0.0998, pruned_loss=0.01639, audio_tagging_loss=0.009792, over 14897.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.0896, pruned_loss=0.01258, audio_tagging_loss=0.008886, over 3055521.39 frames. ], batch size: 54, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 08:16:42,324 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.39 vs. limit=15.0 2023-11-26 08:16:47,416 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 495000 2023-11-26 08:16:55,617 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3299980.0, ans=0.125 2023-11-26 08:17:10,452 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3300113.3333333335, ans=0.0 2023-11-26 08:17:19,874 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 2050, loss[loss=0.06059, simple_loss=0.07453, pruned_loss=0.01282, audio_tagging_loss=0.01051, over 15675.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.08921, pruned_loss=0.01274, audio_tagging_loss=0.00888, over 3046864.16 frames. ], batch size: 60, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 08:17:20,268 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.03 vs. limit=15.0 2023-11-26 08:17:30,244 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3300246.6666666665, ans=0.125 2023-11-26 08:17:33,876 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.53 vs. limit=6.0 2023-11-26 08:17:34,878 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.14 vs. limit=15.0 2023-11-26 08:17:43,467 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 495050 2023-11-26 08:17:56,366 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3300380.0, ans=0.2 2023-11-26 08:17:58,313 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.946e+01 8.680e+01 9.276e+01 1.017e+02 1.208e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-26 08:18:07,260 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3300446.6666666665, ans=0.0 2023-11-26 08:18:16,046 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 2100, loss[loss=0.06852, simple_loss=0.09736, pruned_loss=0.009969, audio_tagging_loss=0.009872, over 14363.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.08894, pruned_loss=0.01264, audio_tagging_loss=0.008873, over 3045201.74 frames. ], batch size: 55, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 08:18:17,644 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.80 vs. limit=6.0 2023-11-26 08:18:21,087 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3300513.3333333335, ans=0.125 2023-11-26 08:18:34,978 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3300580.0, ans=0.125 2023-11-26 08:18:39,017 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 495100 2023-11-26 08:18:55,002 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.46 vs. limit=15.0 2023-11-26 08:19:05,171 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.36 vs. limit=22.5 2023-11-26 08:19:08,964 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3300780.0, ans=0.0 2023-11-26 08:19:11,871 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 2150, loss[loss=0.05303, simple_loss=0.07104, pruned_loss=0.007962, audio_tagging_loss=0.009546, over 15065.00 frames. ], tot_loss[loss=0.06596, simple_loss=0.08906, pruned_loss=0.01254, audio_tagging_loss=0.008891, over 3054032.19 frames. ], batch size: 56, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 08:19:12,059 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3300846.6666666665, ans=0.2 2023-11-26 08:19:20,910 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.07 vs. limit=15.0 2023-11-26 08:19:35,808 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 495150 2023-11-26 08:19:43,400 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 08:19:44,898 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 08:19:47,584 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.37 vs. limit=15.0 2023-11-26 08:19:51,200 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.490e+01 8.871e+01 9.357e+01 1.023e+02 1.211e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-26 08:20:04,207 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3301113.3333333335, ans=0.0 2023-11-26 08:20:04,253 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3301113.3333333335, ans=0.0 2023-11-26 08:20:07,205 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 2200, loss[loss=0.05763, simple_loss=0.07578, pruned_loss=0.01163, audio_tagging_loss=0.008107, over 14713.00 frames. ], tot_loss[loss=0.06646, simple_loss=0.0903, pruned_loss=0.01251, audio_tagging_loss=0.008807, over 3049715.87 frames. ], batch size: 57, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 08:20:27,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3301246.6666666665, ans=0.125 2023-11-26 08:20:31,785 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 495200 2023-11-26 08:20:38,698 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3301313.3333333335, ans=0.125 2023-11-26 08:20:57,938 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3301446.6666666665, ans=0.1 2023-11-26 08:21:04,551 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 2250, loss[loss=0.07257, simple_loss=0.1021, pruned_loss=0.01431, audio_tagging_loss=0.007224, over 16040.00 frames. ], tot_loss[loss=0.0667, simple_loss=0.09083, pruned_loss=0.01248, audio_tagging_loss=0.008808, over 3049423.30 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:21:16,698 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.04 vs. limit=15.0 2023-11-26 08:21:19,207 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3301580.0, ans=0.2 2023-11-26 08:21:23,610 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3301580.0, ans=0.0 2023-11-26 08:21:27,595 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 495250 2023-11-26 08:21:43,620 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.449e+01 8.823e+01 9.427e+01 1.035e+02 1.716e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-26 08:21:44,140 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.17 vs. limit=15.0 2023-11-26 08:22:00,206 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 2300, loss[loss=0.05322, simple_loss=0.06539, pruned_loss=0.01133, audio_tagging_loss=0.009189, over 14518.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.09, pruned_loss=0.01239, audio_tagging_loss=0.008945, over 3046850.48 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:22:15,302 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3301913.3333333335, ans=0.0 2023-11-26 08:22:20,117 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3301913.3333333335, ans=0.1 2023-11-26 08:22:23,225 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 495300 2023-11-26 08:22:39,888 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3302046.6666666665, ans=0.1 2023-11-26 08:22:45,138 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3302113.3333333335, ans=0.0 2023-11-26 08:22:46,143 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3302113.3333333335, ans=0.0 2023-11-26 08:22:48,095 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 08:22:55,590 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 2350, loss[loss=0.07571, simple_loss=0.106, pruned_loss=0.01582, audio_tagging_loss=0.006903, over 14848.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.09052, pruned_loss=0.01243, audio_tagging_loss=0.008902, over 3041772.18 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:22:58,975 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3302180.0, ans=0.1 2023-11-26 08:23:02,771 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3302180.0, ans=0.0 2023-11-26 08:23:07,630 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3302246.6666666665, ans=0.0 2023-11-26 08:23:20,227 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 495350 2023-11-26 08:23:21,970 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.10 vs. limit=22.5 2023-11-26 08:23:32,957 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 08:23:34,833 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.436e+01 8.737e+01 9.480e+01 1.014e+02 1.457e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-26 08:23:46,805 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.24 vs. limit=15.0 2023-11-26 08:23:51,966 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 2400, loss[loss=0.06318, simple_loss=0.08838, pruned_loss=0.009376, audio_tagging_loss=0.009611, over 14711.00 frames. ], tot_loss[loss=0.06712, simple_loss=0.09116, pruned_loss=0.01259, audio_tagging_loss=0.008954, over 3038848.32 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 08:24:15,477 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 495400 2023-11-26 08:24:17,998 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3302646.6666666665, ans=0.0 2023-11-26 08:24:18,422 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.35 vs. limit=15.0 2023-11-26 08:24:39,012 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3302780.0, ans=0.0 2023-11-26 08:24:46,739 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3302780.0, ans=0.0 2023-11-26 08:24:48,736 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 2450, loss[loss=0.07085, simple_loss=0.09685, pruned_loss=0.0105, audio_tagging_loss=0.01193, over 16625.00 frames. ], tot_loss[loss=0.06699, simple_loss=0.09071, pruned_loss=0.01261, audio_tagging_loss=0.009027, over 3044646.53 frames. ], batch size: 60, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:24:49,210 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.23 vs. limit=15.0 2023-11-26 08:25:01,697 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3302913.3333333335, ans=0.0 2023-11-26 08:25:08,026 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3302913.3333333335, ans=0.125 2023-11-26 08:25:11,619 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 495450 2023-11-26 08:25:14,853 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3302980.0, ans=0.2 2023-11-26 08:25:19,724 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 08:25:28,951 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.831e+01 8.728e+01 9.406e+01 1.027e+02 1.574e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-26 08:25:43,709 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 2500, loss[loss=0.04992, simple_loss=0.06861, pruned_loss=0.005089, audio_tagging_loss=0.01053, over 15459.00 frames. ], tot_loss[loss=0.0664, simple_loss=0.08993, pruned_loss=0.01235, audio_tagging_loss=0.00909, over 3043914.56 frames. ], batch size: 60, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:25:47,007 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3303180.0, ans=0.0 2023-11-26 08:25:52,950 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3303180.0, ans=0.125 2023-11-26 08:26:01,014 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3303246.6666666665, ans=0.125 2023-11-26 08:26:07,844 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 495500 2023-11-26 08:26:35,610 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3303446.6666666665, ans=0.125 2023-11-26 08:26:39,673 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 2550, loss[loss=0.06776, simple_loss=0.09341, pruned_loss=0.01198, audio_tagging_loss=0.009079, over 15647.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08847, pruned_loss=0.01226, audio_tagging_loss=0.009037, over 3043312.16 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:26:43,620 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3303513.3333333335, ans=0.125 2023-11-26 08:26:47,866 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3303513.3333333335, ans=0.2 2023-11-26 08:27:03,211 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 495550 2023-11-26 08:27:19,457 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.178e+01 8.568e+01 9.109e+01 1.007e+02 1.472e+02, threshold=1.822e+02, percent-clipped=0.0 2023-11-26 08:27:35,777 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 2600, loss[loss=0.07309, simple_loss=0.1018, pruned_loss=0.01487, audio_tagging_loss=0.007322, over 15708.00 frames. ], tot_loss[loss=0.0652, simple_loss=0.08824, pruned_loss=0.01218, audio_tagging_loss=0.008892, over 3044757.26 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:27:36,046 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3303846.6666666665, ans=0.0 2023-11-26 08:27:39,520 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.52 vs. limit=15.0 2023-11-26 08:27:42,435 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3303846.6666666665, ans=0.125 2023-11-26 08:27:54,125 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3303913.3333333335, ans=0.2 2023-11-26 08:27:55,449 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.82 vs. limit=15.0 2023-11-26 08:27:58,817 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 495600 2023-11-26 08:28:09,403 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3304046.6666666665, ans=0.1 2023-11-26 08:28:14,743 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3304046.6666666665, ans=0.07 2023-11-26 08:28:27,423 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3304113.3333333335, ans=0.0 2023-11-26 08:28:31,464 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 2650, loss[loss=0.06321, simple_loss=0.08929, pruned_loss=0.01007, audio_tagging_loss=0.008502, over 15534.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08904, pruned_loss=0.01229, audio_tagging_loss=0.008832, over 3047835.52 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:28:31,716 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3304180.0, ans=0.0 2023-11-26 08:28:32,223 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.87 vs. limit=15.0 2023-11-26 08:28:36,900 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3304180.0, ans=0.0 2023-11-26 08:28:44,981 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3304246.6666666665, ans=0.0 2023-11-26 08:28:54,936 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 495650 2023-11-26 08:28:59,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3304313.3333333335, ans=0.125 2023-11-26 08:29:11,961 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.984e+01 8.718e+01 9.187e+01 9.929e+01 1.273e+02, threshold=1.837e+02, percent-clipped=0.0 2023-11-26 08:29:27,434 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 2700, loss[loss=0.07875, simple_loss=0.1041, pruned_loss=0.01704, audio_tagging_loss=0.009657, over 15006.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.08978, pruned_loss=0.01235, audio_tagging_loss=0.008652, over 3050769.42 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:29:51,633 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 495700 2023-11-26 08:30:08,039 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3304713.3333333335, ans=0.125 2023-11-26 08:30:08,092 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3304713.3333333335, ans=0.125 2023-11-26 08:30:23,770 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 2750, loss[loss=0.0674, simple_loss=0.09821, pruned_loss=0.01037, audio_tagging_loss=0.007925, over 15636.00 frames. ], tot_loss[loss=0.06509, simple_loss=0.08853, pruned_loss=0.01214, audio_tagging_loss=0.00869, over 3052495.62 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:30:27,608 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.66 vs. limit=12.0 2023-11-26 08:30:31,920 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3304846.6666666665, ans=0.125 2023-11-26 08:30:33,110 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3304846.6666666665, ans=0.125 2023-11-26 08:30:34,598 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.77 vs. limit=15.0 2023-11-26 08:30:41,535 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3304913.3333333335, ans=0.125 2023-11-26 08:30:41,960 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.34 vs. limit=12.0 2023-11-26 08:30:46,676 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 495750 2023-11-26 08:30:57,314 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3305046.6666666665, ans=0.1 2023-11-26 08:31:03,383 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.804e+01 8.931e+01 9.557e+01 1.024e+02 1.484e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-26 08:31:08,601 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3305113.3333333335, ans=0.125 2023-11-26 08:31:10,388 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 08:31:17,977 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.10 vs. limit=15.0 2023-11-26 08:31:19,453 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 2800, loss[loss=0.06672, simple_loss=0.08965, pruned_loss=0.01134, audio_tagging_loss=0.01055, over 14956.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.08891, pruned_loss=0.01221, audio_tagging_loss=0.008739, over 3054320.07 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 08:31:24,404 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.32 vs. limit=15.0 2023-11-26 08:31:30,991 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.out_whiten.whitening_limit, batch_count=3305246.6666666665, ans=8.0 2023-11-26 08:31:43,135 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 495800 2023-11-26 08:31:46,133 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.56 vs. limit=15.0 2023-11-26 08:31:47,783 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3305313.3333333335, ans=0.0 2023-11-26 08:31:51,606 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3305313.3333333335, ans=0.1 2023-11-26 08:31:51,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3305313.3333333335, ans=0.125 2023-11-26 08:31:52,689 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 08:31:58,421 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3305380.0, ans=0.0 2023-11-26 08:32:12,703 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.74 vs. limit=15.0 2023-11-26 08:32:15,821 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 2850, loss[loss=0.06034, simple_loss=0.08061, pruned_loss=0.0105, audio_tagging_loss=0.009536, over 16031.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.08996, pruned_loss=0.01242, audio_tagging_loss=0.00859, over 3052353.50 frames. ], batch size: 61, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 08:32:21,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3305513.3333333335, ans=0.125 2023-11-26 08:32:35,308 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3305580.0, ans=0.125 2023-11-26 08:32:38,568 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3305646.6666666665, ans=0.0 2023-11-26 08:32:39,490 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 495850 2023-11-26 08:32:55,960 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.651e+01 8.658e+01 9.306e+01 1.021e+02 1.225e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-26 08:33:11,936 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 2900, loss[loss=0.07223, simple_loss=0.1026, pruned_loss=0.01216, audio_tagging_loss=0.008755, over 16044.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.0891, pruned_loss=0.01224, audio_tagging_loss=0.008631, over 3050243.05 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 08:33:16,419 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.16 vs. limit=15.0 2023-11-26 08:33:21,397 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.08 vs. limit=12.0 2023-11-26 08:33:35,380 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 495900 2023-11-26 08:33:36,594 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3305980.0, ans=10.0 2023-11-26 08:33:56,862 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3306113.3333333335, ans=0.125 2023-11-26 08:33:58,311 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3306113.3333333335, ans=0.125 2023-11-26 08:34:07,732 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 2950, loss[loss=0.05544, simple_loss=0.07352, pruned_loss=0.009662, audio_tagging_loss=0.009016, over 14714.00 frames. ], tot_loss[loss=0.06596, simple_loss=0.08996, pruned_loss=0.01232, audio_tagging_loss=0.008665, over 3051011.57 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 08:34:15,924 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3306180.0, ans=0.125 2023-11-26 08:34:31,570 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 495950 2023-11-26 08:34:31,767 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3306313.3333333335, ans=0.0 2023-11-26 08:34:44,104 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3306380.0, ans=0.125 2023-11-26 08:34:48,000 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.570e+01 8.833e+01 9.371e+01 1.025e+02 1.490e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-26 08:35:01,922 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.72 vs. limit=15.0 2023-11-26 08:35:03,553 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 3000, loss[loss=0.06649, simple_loss=0.09494, pruned_loss=0.01011, audio_tagging_loss=0.008911, over 15728.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.08988, pruned_loss=0.0124, audio_tagging_loss=0.008768, over 3051225.82 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 08:35:03,554 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-26 08:35:31,410 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.0048, 3.9965, 4.8336, 4.4678], device='cuda:0') 2023-11-26 08:35:36,321 INFO [train_asr.py:1267] (0/4) Epoch 42, validation: loss=0.05776, simple_loss=0.05062, pruned_loss=0.005203, audio_tagging_loss=0.02725, over 4681554.00 frames. 2023-11-26 08:35:36,322 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-26 08:35:54,266 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=3306580.0, ans=15.0 2023-11-26 08:35:57,068 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3306646.6666666665, ans=0.025 2023-11-26 08:35:59,079 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 496000 2023-11-26 08:36:00,407 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-496000.pt 2023-11-26 08:36:06,757 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.27 vs. limit=15.0 2023-11-26 08:36:08,484 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3306646.6666666665, ans=0.1 2023-11-26 08:36:31,767 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3306780.0, ans=0.125 2023-11-26 08:36:33,634 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 3050, loss[loss=0.0788, simple_loss=0.09809, pruned_loss=0.0194, audio_tagging_loss=0.01035, over 15287.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.09059, pruned_loss=0.01259, audio_tagging_loss=0.008813, over 3052135.58 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 08:36:38,592 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-26 08:36:41,826 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3306846.6666666665, ans=0.2 2023-11-26 08:36:54,368 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.51 vs. limit=15.0 2023-11-26 08:36:57,593 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 496050 2023-11-26 08:37:05,387 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 08:37:12,586 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.48 vs. limit=15.0 2023-11-26 08:37:13,910 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.281e+01 8.651e+01 9.305e+01 1.008e+02 1.239e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-26 08:37:14,128 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3307046.6666666665, ans=0.1 2023-11-26 08:37:29,292 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 3100, loss[loss=0.05059, simple_loss=0.06724, pruned_loss=0.005771, audio_tagging_loss=0.0112, over 14081.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.0903, pruned_loss=0.01267, audio_tagging_loss=0.008823, over 3047125.51 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 08:37:29,479 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3307180.0, ans=0.125 2023-11-26 08:37:34,332 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 08:37:50,496 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3307246.6666666665, ans=0.125 2023-11-26 08:37:53,478 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 496100 2023-11-26 08:37:54,738 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3307313.3333333335, ans=0.125 2023-11-26 08:38:06,968 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3307380.0, ans=0.2 2023-11-26 08:38:22,043 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.89 vs. limit=6.0 2023-11-26 08:38:25,837 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 3150, loss[loss=0.04907, simple_loss=0.06766, pruned_loss=0.007255, audio_tagging_loss=0.007982, over 15402.00 frames. ], tot_loss[loss=0.06699, simple_loss=0.09094, pruned_loss=0.01271, audio_tagging_loss=0.008809, over 3045564.83 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 08:38:27,751 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3307513.3333333335, ans=0.125 2023-11-26 08:38:37,261 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3307580.0, ans=0.125 2023-11-26 08:38:39,737 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.71 vs. limit=12.0 2023-11-26 08:38:49,550 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 496150 2023-11-26 08:38:58,732 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3307713.3333333335, ans=0.0 2023-11-26 08:39:06,519 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.692e+01 8.861e+01 9.326e+01 1.004e+02 1.383e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-26 08:39:06,699 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3307713.3333333335, ans=0.125 2023-11-26 08:39:08,144 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.31 vs. limit=15.0 2023-11-26 08:39:12,811 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3307780.0, ans=0.125 2023-11-26 08:39:22,073 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 3200, loss[loss=0.07599, simple_loss=0.1029, pruned_loss=0.01558, audio_tagging_loss=0.008972, over 15508.00 frames. ], tot_loss[loss=0.06729, simple_loss=0.09147, pruned_loss=0.01269, audio_tagging_loss=0.008867, over 3052821.32 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 08:39:26,598 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3307846.6666666665, ans=0.0 2023-11-26 08:39:37,746 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3307913.3333333335, ans=0.1 2023-11-26 08:39:42,974 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.86 vs. limit=15.0 2023-11-26 08:39:45,630 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 496200 2023-11-26 08:39:52,020 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3307980.0, ans=0.125 2023-11-26 08:39:58,000 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3308046.6666666665, ans=0.2 2023-11-26 08:40:04,292 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3308046.6666666665, ans=0.125 2023-11-26 08:40:18,473 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 3250, loss[loss=0.08491, simple_loss=0.118, pruned_loss=0.01898, audio_tagging_loss=0.006935, over 16009.00 frames. ], tot_loss[loss=0.06677, simple_loss=0.0904, pruned_loss=0.01258, audio_tagging_loss=0.008994, over 3053819.42 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 08:40:22,706 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.63 vs. limit=15.0 2023-11-26 08:40:39,510 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.83 vs. limit=15.0 2023-11-26 08:40:42,204 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 496250 2023-11-26 08:40:42,556 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.99 vs. limit=6.0 2023-11-26 08:40:56,170 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 08:40:58,070 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.019e+01 8.751e+01 9.386e+01 1.020e+02 1.370e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-26 08:41:14,548 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 3300, loss[loss=0.06558, simple_loss=0.0929, pruned_loss=0.01069, audio_tagging_loss=0.008439, over 15638.00 frames. ], tot_loss[loss=0.0666, simple_loss=0.08981, pruned_loss=0.01255, audio_tagging_loss=0.009146, over 3052943.79 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 08:41:21,712 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3308513.3333333335, ans=0.125 2023-11-26 08:41:23,873 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3308513.3333333335, ans=0.125 2023-11-26 08:41:23,949 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3308513.3333333335, ans=0.125 2023-11-26 08:41:36,837 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.36 vs. limit=12.0 2023-11-26 08:41:37,461 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 496300 2023-11-26 08:41:40,693 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3308646.6666666665, ans=0.125 2023-11-26 08:41:45,552 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3308646.6666666665, ans=0.0 2023-11-26 08:41:51,533 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3308713.3333333335, ans=0.1 2023-11-26 08:42:04,356 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3308780.0, ans=0.125 2023-11-26 08:42:10,607 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 3350, loss[loss=0.08173, simple_loss=0.1104, pruned_loss=0.01854, audio_tagging_loss=0.007967, over 15084.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.09, pruned_loss=0.01255, audio_tagging_loss=0.009004, over 3045654.60 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 08:42:20,558 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3308913.3333333335, ans=0.0 2023-11-26 08:42:33,885 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 496350 2023-11-26 08:42:35,138 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3308980.0, ans=0.125 2023-11-26 08:42:49,952 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 08:42:50,745 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.716e+01 8.810e+01 9.666e+01 1.064e+02 1.433e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-26 08:42:55,392 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.67 vs. limit=15.0 2023-11-26 08:43:04,739 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3309180.0, ans=0.0 2023-11-26 08:43:05,516 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 3400, loss[loss=0.08813, simple_loss=0.126, pruned_loss=0.01467, audio_tagging_loss=0.01048, over 15193.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.08976, pruned_loss=0.01237, audio_tagging_loss=0.008934, over 3048989.58 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 08:43:07,932 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3309180.0, ans=0.125 2023-11-26 08:43:11,941 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.68 vs. limit=15.0 2023-11-26 08:43:13,802 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3309180.0, ans=0.0 2023-11-26 08:43:29,282 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 496400 2023-11-26 08:43:37,117 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3309313.3333333335, ans=0.125 2023-11-26 08:43:39,588 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.15 vs. limit=6.0 2023-11-26 08:43:45,510 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3309380.0, ans=0.125 2023-11-26 08:43:49,698 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3309446.6666666665, ans=0.125 2023-11-26 08:43:59,354 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3309446.6666666665, ans=0.0 2023-11-26 08:44:01,841 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 3450, loss[loss=0.07601, simple_loss=0.1066, pruned_loss=0.01748, audio_tagging_loss=0.005232, over 15575.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.08982, pruned_loss=0.01232, audio_tagging_loss=0.00878, over 3049506.98 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 08:44:05,188 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 08:44:07,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3309513.3333333335, ans=0.125 2023-11-26 08:44:08,997 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3309513.3333333335, ans=0.125 2023-11-26 08:44:09,440 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.70 vs. limit=12.0 2023-11-26 08:44:24,947 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 496450 2023-11-26 08:44:41,766 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.803e+01 8.832e+01 9.547e+01 1.006e+02 1.211e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-26 08:44:45,337 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.37 vs. limit=5.0 2023-11-26 08:44:57,756 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 3500, loss[loss=0.08288, simple_loss=0.1207, pruned_loss=0.01472, audio_tagging_loss=0.007804, over 15324.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.09009, pruned_loss=0.01246, audio_tagging_loss=0.008767, over 3054216.07 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 08:45:20,664 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 496500 2023-11-26 08:45:20,852 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3309980.0, ans=0.125 2023-11-26 08:45:25,960 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 08:45:53,132 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 3550, loss[loss=0.06033, simple_loss=0.08197, pruned_loss=0.009179, audio_tagging_loss=0.01017, over 16554.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.08975, pruned_loss=0.01226, audio_tagging_loss=0.008781, over 3056209.29 frames. ], batch size: 64, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:45:54,362 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3310180.0, ans=0.125 2023-11-26 08:45:55,523 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3310180.0, ans=0.125 2023-11-26 08:46:10,352 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.30 vs. limit=22.5 2023-11-26 08:46:16,832 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 496550 2023-11-26 08:46:22,420 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3310313.3333333335, ans=0.1 2023-11-26 08:46:33,840 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.065e+01 8.432e+01 9.183e+01 9.736e+01 1.809e+02, threshold=1.837e+02, percent-clipped=0.0 2023-11-26 08:46:48,260 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 3600, loss[loss=0.0602, simple_loss=0.07691, pruned_loss=0.01174, audio_tagging_loss=0.01001, over 14746.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08893, pruned_loss=0.01219, audio_tagging_loss=0.008701, over 3053644.08 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 08:47:03,975 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.08 vs. limit=6.0 2023-11-26 08:47:12,054 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 496600 2023-11-26 08:47:17,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3310646.6666666665, ans=0.125 2023-11-26 08:47:45,309 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 3650, loss[loss=0.07248, simple_loss=0.1006, pruned_loss=0.01525, audio_tagging_loss=0.006941, over 15240.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.0898, pruned_loss=0.01241, audio_tagging_loss=0.00864, over 3048904.26 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:47:47,081 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.79 vs. limit=10.0 2023-11-26 08:47:53,380 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.75 vs. limit=22.5 2023-11-26 08:48:08,439 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 496650 2023-11-26 08:48:11,964 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.59 vs. limit=15.0 2023-11-26 08:48:24,513 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.47 vs. limit=15.0 2023-11-26 08:48:27,228 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3311046.6666666665, ans=0.125 2023-11-26 08:48:29,087 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.052e+01 8.589e+01 9.068e+01 9.988e+01 1.098e+02, threshold=1.814e+02, percent-clipped=0.0 2023-11-26 08:48:30,518 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3311113.3333333335, ans=0.125 2023-11-26 08:48:35,045 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.18 vs. limit=15.0 2023-11-26 08:48:40,912 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 3700, loss[loss=0.04489, simple_loss=0.0552, pruned_loss=0.005349, audio_tagging_loss=0.01194, over 17112.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.08997, pruned_loss=0.01238, audio_tagging_loss=0.008682, over 3050247.14 frames. ], batch size: 65, lr: 1.61e-03, grad_scale: 8.0 2023-11-26 08:48:43,403 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3311180.0, ans=0.0 2023-11-26 08:48:52,345 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3311246.6666666665, ans=0.07 2023-11-26 08:48:54,435 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3311246.6666666665, ans=0.04949747468305833 2023-11-26 08:48:58,099 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3311246.6666666665, ans=0.2 2023-11-26 08:48:59,168 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3311246.6666666665, ans=0.125 2023-11-26 08:49:04,827 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 496700 2023-11-26 08:49:12,864 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3311313.3333333335, ans=0.125 2023-11-26 08:49:15,258 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.92 vs. limit=22.5 2023-11-26 08:49:19,479 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.17 vs. limit=10.0 2023-11-26 08:49:36,479 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 3750, loss[loss=0.07273, simple_loss=0.09499, pruned_loss=0.01398, audio_tagging_loss=0.01125, over 16362.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.09057, pruned_loss=0.01249, audio_tagging_loss=0.008706, over 3052957.93 frames. ], batch size: 61, lr: 1.61e-03, grad_scale: 8.0 2023-11-26 08:49:44,817 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3311513.3333333335, ans=0.125 2023-11-26 08:49:50,174 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3311580.0, ans=0.2 2023-11-26 08:49:55,702 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3311580.0, ans=0.2 2023-11-26 08:50:00,810 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 496750 2023-11-26 08:50:13,652 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 08:50:20,470 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.645e+01 8.835e+01 9.456e+01 1.002e+02 1.375e+02, threshold=1.891e+02, percent-clipped=0.0 2023-11-26 08:50:23,376 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3311780.0, ans=0.1 2023-11-26 08:50:25,878 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.71 vs. limit=22.5 2023-11-26 08:50:33,738 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 3800, loss[loss=0.05718, simple_loss=0.07228, pruned_loss=0.009039, audio_tagging_loss=0.012, over 15013.00 frames. ], tot_loss[loss=0.06675, simple_loss=0.09074, pruned_loss=0.01256, audio_tagging_loss=0.008828, over 3055487.69 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 8.0 2023-11-26 08:50:52,119 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3311913.3333333335, ans=0.09899494936611666 2023-11-26 08:50:54,260 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3311980.0, ans=0.125 2023-11-26 08:50:55,772 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.22 vs. limit=15.0 2023-11-26 08:50:56,221 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 496800 2023-11-26 08:50:56,334 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3311980.0, ans=0.0 2023-11-26 08:51:03,373 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3311980.0, ans=0.125 2023-11-26 08:51:11,521 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.70 vs. limit=15.0 2023-11-26 08:51:28,226 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3312180.0, ans=0.1 2023-11-26 08:51:29,054 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 3850, loss[loss=0.06673, simple_loss=0.0984, pruned_loss=0.01015, audio_tagging_loss=0.007384, over 14769.00 frames. ], tot_loss[loss=0.06671, simple_loss=0.09083, pruned_loss=0.01242, audio_tagging_loss=0.008875, over 3052882.51 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 8.0 2023-11-26 08:51:41,639 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 08:51:48,543 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3312246.6666666665, ans=0.5 2023-11-26 08:51:50,608 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3312313.3333333335, ans=0.0 2023-11-26 08:51:50,881 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.09 vs. limit=15.0 2023-11-26 08:51:52,556 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 496850 2023-11-26 08:52:06,932 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.98 vs. limit=15.0 2023-11-26 08:52:12,780 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.556e+01 8.753e+01 9.436e+01 1.032e+02 1.247e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-26 08:52:19,351 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3312446.6666666665, ans=0.0 2023-11-26 08:52:24,966 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 3900, loss[loss=0.07436, simple_loss=0.1015, pruned_loss=0.01608, audio_tagging_loss=0.007521, over 15862.00 frames. ], tot_loss[loss=0.06645, simple_loss=0.09029, pruned_loss=0.01237, audio_tagging_loss=0.008943, over 3047967.93 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 8.0 2023-11-26 08:52:32,423 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.93 vs. limit=22.5 2023-11-26 08:52:38,065 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3312580.0, ans=0.125 2023-11-26 08:52:46,259 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.64 vs. limit=22.5 2023-11-26 08:52:49,037 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 496900 2023-11-26 08:52:49,549 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.79 vs. limit=22.5 2023-11-26 08:52:52,453 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3312646.6666666665, ans=0.1 2023-11-26 08:52:52,546 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3312646.6666666665, ans=0.125 2023-11-26 08:53:07,949 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3312713.3333333335, ans=0.0 2023-11-26 08:53:12,276 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3312780.0, ans=0.0 2023-11-26 08:53:21,510 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 3950, loss[loss=0.05102, simple_loss=0.06632, pruned_loss=0.007717, audio_tagging_loss=0.01014, over 15170.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.09058, pruned_loss=0.01254, audio_tagging_loss=0.009024, over 3049361.01 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 8.0 2023-11-26 08:53:31,955 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3312913.3333333335, ans=0.0 2023-11-26 08:53:44,603 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 496950 2023-11-26 08:53:49,699 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.85 vs. limit=22.5 2023-11-26 08:53:50,592 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3312980.0, ans=0.0 2023-11-26 08:54:05,140 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.800e+01 8.614e+01 9.558e+01 1.040e+02 1.255e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-26 08:54:07,553 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3313113.3333333335, ans=0.0 2023-11-26 08:54:17,369 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 4000, loss[loss=0.0722, simple_loss=0.08935, pruned_loss=0.0177, audio_tagging_loss=0.009826, over 14551.00 frames. ], tot_loss[loss=0.06717, simple_loss=0.09104, pruned_loss=0.01261, audio_tagging_loss=0.009039, over 3045453.21 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:54:26,140 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3313180.0, ans=0.2 2023-11-26 08:54:27,231 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3313246.6666666665, ans=0.125 2023-11-26 08:54:41,090 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 497000 2023-11-26 08:55:00,645 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3313380.0, ans=0.0 2023-11-26 08:55:04,194 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.57 vs. limit=15.0 2023-11-26 08:55:13,073 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 4050, loss[loss=0.0722, simple_loss=0.09514, pruned_loss=0.01594, audio_tagging_loss=0.008692, over 14635.00 frames. ], tot_loss[loss=0.06756, simple_loss=0.09144, pruned_loss=0.01283, audio_tagging_loss=0.009011, over 3049275.15 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:55:14,670 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 08:55:20,109 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3313513.3333333335, ans=0.125 2023-11-26 08:55:22,226 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.27 vs. limit=6.0 2023-11-26 08:55:33,983 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3313580.0, ans=0.0 2023-11-26 08:55:37,100 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 497050 2023-11-26 08:55:37,276 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3313646.6666666665, ans=0.125 2023-11-26 08:55:48,395 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3313713.3333333335, ans=0.05 2023-11-26 08:55:53,945 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3313713.3333333335, ans=0.125 2023-11-26 08:55:57,328 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.667e+01 8.799e+01 9.457e+01 1.021e+02 1.367e+02, threshold=1.891e+02, percent-clipped=0.0 2023-11-26 08:56:03,281 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.42 vs. limit=15.0 2023-11-26 08:56:09,659 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 4100, loss[loss=0.05255, simple_loss=0.07012, pruned_loss=0.008106, audio_tagging_loss=0.009382, over 14087.00 frames. ], tot_loss[loss=0.06752, simple_loss=0.09146, pruned_loss=0.01279, audio_tagging_loss=0.009002, over 3053985.52 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:56:33,308 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 497100 2023-11-26 08:56:55,441 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3314113.3333333335, ans=0.95 2023-11-26 08:57:03,242 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.73 vs. limit=6.0 2023-11-26 08:57:05,850 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 4150, loss[loss=0.06989, simple_loss=0.09106, pruned_loss=0.01449, audio_tagging_loss=0.009873, over 15383.00 frames. ], tot_loss[loss=0.06708, simple_loss=0.09093, pruned_loss=0.01269, audio_tagging_loss=0.00892, over 3057697.90 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:57:18,345 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-26 08:57:29,892 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 497150 2023-11-26 08:57:45,169 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 08:57:45,398 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3314380.0, ans=0.125 2023-11-26 08:57:48,648 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3314380.0, ans=0.2 2023-11-26 08:57:49,402 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.533e+01 8.973e+01 9.473e+01 1.014e+02 1.383e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-26 08:58:01,786 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 4200, loss[loss=0.06902, simple_loss=0.1022, pruned_loss=0.01065, audio_tagging_loss=0.007274, over 16504.00 frames. ], tot_loss[loss=0.06692, simple_loss=0.09115, pruned_loss=0.01258, audio_tagging_loss=0.008771, over 3055779.73 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:58:25,597 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 497200 2023-11-26 08:58:32,800 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.71 vs. limit=15.0 2023-11-26 08:58:36,673 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3314713.3333333335, ans=0.125 2023-11-26 08:58:57,496 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3314846.6666666665, ans=0.125 2023-11-26 08:58:58,272 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 4250, loss[loss=0.07903, simple_loss=0.1, pruned_loss=0.01787, audio_tagging_loss=0.01114, over 14423.00 frames. ], tot_loss[loss=0.06675, simple_loss=0.09098, pruned_loss=0.01255, audio_tagging_loss=0.008711, over 3060802.52 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:58:59,530 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3314846.6666666665, ans=0.07 2023-11-26 08:59:03,393 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3314846.6666666665, ans=0.1 2023-11-26 08:59:05,661 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.77 vs. limit=22.5 2023-11-26 08:59:18,182 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3314913.3333333335, ans=0.125 2023-11-26 08:59:21,173 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 497250 2023-11-26 08:59:23,908 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3314980.0, ans=0.125 2023-11-26 08:59:38,903 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3315046.6666666665, ans=0.125 2023-11-26 08:59:38,975 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3315046.6666666665, ans=0.05 2023-11-26 08:59:39,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3315046.6666666665, ans=0.2 2023-11-26 08:59:41,841 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.249e+01 8.666e+01 9.281e+01 9.909e+01 1.116e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-26 08:59:54,038 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 4300, loss[loss=0.06115, simple_loss=0.08015, pruned_loss=0.01143, audio_tagging_loss=0.009641, over 15367.00 frames. ], tot_loss[loss=0.06688, simple_loss=0.09146, pruned_loss=0.01252, audio_tagging_loss=0.008639, over 3061020.82 frames. ], batch size: 63, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:00:04,874 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3315246.6666666665, ans=0.0 2023-11-26 09:00:17,440 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 497300 2023-11-26 09:00:35,636 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3315380.0, ans=0.125 2023-11-26 09:00:35,642 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3315380.0, ans=0.125 2023-11-26 09:00:38,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3315446.6666666665, ans=0.1 2023-11-26 09:00:42,055 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3315446.6666666665, ans=0.1 2023-11-26 09:00:45,184 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3315446.6666666665, ans=0.125 2023-11-26 09:00:49,347 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 4350, loss[loss=0.04507, simple_loss=0.05804, pruned_loss=0.007623, audio_tagging_loss=0.008427, over 15368.00 frames. ], tot_loss[loss=0.06731, simple_loss=0.09222, pruned_loss=0.01261, audio_tagging_loss=0.008597, over 3050010.56 frames. ], batch size: 61, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:00:53,342 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3315513.3333333335, ans=0.125 2023-11-26 09:01:05,435 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.67 vs. limit=15.0 2023-11-26 09:01:13,986 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 497350 2023-11-26 09:01:33,043 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.562e+01 8.733e+01 9.373e+01 9.862e+01 1.351e+02, threshold=1.875e+02, percent-clipped=0.0 2023-11-26 09:01:46,598 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 4400, loss[loss=0.06939, simple_loss=0.1042, pruned_loss=0.01145, audio_tagging_loss=0.005857, over 16105.00 frames. ], tot_loss[loss=0.06719, simple_loss=0.09214, pruned_loss=0.01263, audio_tagging_loss=0.00849, over 3052766.28 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:02:07,438 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3315980.0, ans=0.1 2023-11-26 09:02:09,310 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 497400 2023-11-26 09:02:17,222 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 09:02:23,639 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3316046.6666666665, ans=0.2 2023-11-26 09:02:34,996 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3316113.3333333335, ans=0.0 2023-11-26 09:02:40,757 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3316113.3333333335, ans=0.1 2023-11-26 09:02:42,681 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 4450, loss[loss=0.05284, simple_loss=0.06995, pruned_loss=0.007715, audio_tagging_loss=0.01015, over 14041.00 frames. ], tot_loss[loss=0.06694, simple_loss=0.09147, pruned_loss=0.01262, audio_tagging_loss=0.008589, over 3047605.49 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:03:06,114 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 497450 2023-11-26 09:03:07,270 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3316313.3333333335, ans=0.0 2023-11-26 09:03:24,481 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.44 vs. limit=22.5 2023-11-26 09:03:26,129 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.847e+01 8.913e+01 9.793e+01 1.057e+02 1.326e+02, threshold=1.959e+02, percent-clipped=0.0 2023-11-26 09:03:29,681 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3316446.6666666665, ans=0.1 2023-11-26 09:03:37,014 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3316513.3333333335, ans=0.2 2023-11-26 09:03:37,883 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 4500, loss[loss=0.04622, simple_loss=0.05927, pruned_loss=0.007116, audio_tagging_loss=0.00947, over 15174.00 frames. ], tot_loss[loss=0.06685, simple_loss=0.09122, pruned_loss=0.0126, audio_tagging_loss=0.008638, over 3050364.23 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:03:43,768 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.84 vs. limit=15.0 2023-11-26 09:03:49,882 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3316580.0, ans=0.125 2023-11-26 09:03:55,680 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3316580.0, ans=0.0 2023-11-26 09:03:55,931 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.26 vs. limit=15.0 2023-11-26 09:03:55,967 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.76 vs. limit=15.0 2023-11-26 09:04:02,421 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 497500 2023-11-26 09:04:16,783 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.94 vs. limit=22.5 2023-11-26 09:04:33,577 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3316846.6666666665, ans=0.0 2023-11-26 09:04:34,321 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 4550, loss[loss=0.06608, simple_loss=0.09059, pruned_loss=0.0122, audio_tagging_loss=0.008583, over 16207.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.09005, pruned_loss=0.01238, audio_tagging_loss=0.00875, over 3046681.37 frames. ], batch size: 64, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:04:40,434 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3316846.6666666665, ans=0.05 2023-11-26 09:04:53,922 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3316913.3333333335, ans=0.95 2023-11-26 09:04:58,023 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 497550 2023-11-26 09:05:01,695 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.12 vs. limit=15.0 2023-11-26 09:05:16,128 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 09:05:19,836 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.953e+01 8.805e+01 9.410e+01 9.881e+01 1.547e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-26 09:05:28,335 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.70 vs. limit=15.0 2023-11-26 09:05:31,125 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 4600, loss[loss=0.06652, simple_loss=0.08954, pruned_loss=0.0138, audio_tagging_loss=0.007946, over 15402.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.09044, pruned_loss=0.01253, audio_tagging_loss=0.008677, over 3053869.63 frames. ], batch size: 61, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:05:44,468 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.45 vs. limit=6.0 2023-11-26 09:05:45,122 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3317246.6666666665, ans=0.125 2023-11-26 09:05:51,518 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3317313.3333333335, ans=0.0 2023-11-26 09:05:53,470 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 497600 2023-11-26 09:05:56,695 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3317313.3333333335, ans=10.0 2023-11-26 09:06:14,232 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3317380.0, ans=0.125 2023-11-26 09:06:20,881 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3317446.6666666665, ans=0.125 2023-11-26 09:06:27,037 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 4650, loss[loss=0.06528, simple_loss=0.08732, pruned_loss=0.01401, audio_tagging_loss=0.007606, over 15839.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.08963, pruned_loss=0.01237, audio_tagging_loss=0.008782, over 3048940.35 frames. ], batch size: 60, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:06:31,363 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3317513.3333333335, ans=0.0 2023-11-26 09:06:46,620 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3317580.0, ans=0.0 2023-11-26 09:06:48,151 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.62 vs. limit=15.0 2023-11-26 09:06:50,795 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 497650 2023-11-26 09:07:12,607 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.522e+01 8.826e+01 9.427e+01 1.038e+02 1.331e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-26 09:07:12,834 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3317780.0, ans=0.0 2023-11-26 09:07:17,446 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.32 vs. limit=15.0 2023-11-26 09:07:20,991 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3317780.0, ans=0.1 2023-11-26 09:07:22,794 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 4700, loss[loss=0.07343, simple_loss=0.09983, pruned_loss=0.0139, audio_tagging_loss=0.009623, over 16006.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.09019, pruned_loss=0.01244, audio_tagging_loss=0.008811, over 3047750.58 frames. ], batch size: 62, lr: 1.61e-03, grad_scale: 8.0 2023-11-26 09:07:28,994 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3317846.6666666665, ans=0.125 2023-11-26 09:07:32,045 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3317846.6666666665, ans=0.0 2023-11-26 09:07:33,041 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3317913.3333333335, ans=0.125 2023-11-26 09:07:36,737 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3317913.3333333335, ans=0.1 2023-11-26 09:07:41,061 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3317913.3333333335, ans=0.2 2023-11-26 09:07:46,145 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 497700 2023-11-26 09:08:00,418 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.50 vs. limit=22.5 2023-11-26 09:08:06,983 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3318113.3333333335, ans=0.125 2023-11-26 09:08:13,291 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=3318113.3333333335, ans=0.025 2023-11-26 09:08:18,982 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 4750, loss[loss=0.06812, simple_loss=0.08564, pruned_loss=0.01366, audio_tagging_loss=0.01164, over 14806.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.08939, pruned_loss=0.0124, audio_tagging_loss=0.008852, over 3051854.89 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 8.0 2023-11-26 09:08:33,981 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3318246.6666666665, ans=0.1 2023-11-26 09:08:35,471 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.16 vs. limit=12.0 2023-11-26 09:08:41,262 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 497750 2023-11-26 09:08:47,301 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3318313.3333333335, ans=0.04949747468305833 2023-11-26 09:09:04,459 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.453e+01 8.590e+01 9.271e+01 9.941e+01 1.309e+02, threshold=1.854e+02, percent-clipped=0.0 2023-11-26 09:09:04,634 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3318446.6666666665, ans=0.1 2023-11-26 09:09:08,389 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.whiten.whitening_limit, batch_count=3318446.6666666665, ans=15.0 2023-11-26 09:09:14,041 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 4800, loss[loss=0.07996, simple_loss=0.1132, pruned_loss=0.01735, audio_tagging_loss=0.006021, over 14518.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.08929, pruned_loss=0.01253, audio_tagging_loss=0.008952, over 3047022.06 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:09:18,390 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3318513.3333333335, ans=0.0 2023-11-26 09:09:30,670 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3318580.0, ans=0.0 2023-11-26 09:09:30,709 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3318580.0, ans=0.2 2023-11-26 09:09:35,787 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=3318646.6666666665, ans=15.0 2023-11-26 09:09:36,980 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.30 vs. limit=15.0 2023-11-26 09:09:37,492 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 497800 2023-11-26 09:09:45,467 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3318646.6666666665, ans=0.125 2023-11-26 09:10:10,118 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 4850, loss[loss=0.06551, simple_loss=0.09046, pruned_loss=0.0127, audio_tagging_loss=0.007588, over 14771.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.08944, pruned_loss=0.01242, audio_tagging_loss=0.009007, over 3037682.43 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:10:17,481 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.74 vs. limit=15.0 2023-11-26 09:10:17,722 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.42 vs. limit=15.0 2023-11-26 09:10:33,985 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 497850 2023-11-26 09:10:35,683 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.10 vs. limit=15.0 2023-11-26 09:10:36,774 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.58 vs. limit=15.0 2023-11-26 09:10:52,804 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3319046.6666666665, ans=0.1 2023-11-26 09:10:55,695 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.411e+01 8.672e+01 9.359e+01 1.001e+02 1.200e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-26 09:11:06,514 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 4900, loss[loss=0.05695, simple_loss=0.07881, pruned_loss=0.01035, audio_tagging_loss=0.007194, over 14216.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.08888, pruned_loss=0.01235, audio_tagging_loss=0.009113, over 3041562.26 frames. ], batch size: 53, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:11:09,891 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3319180.0, ans=0.125 2023-11-26 09:11:23,179 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.55 vs. limit=22.5 2023-11-26 09:11:28,838 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 497900 2023-11-26 09:11:38,036 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3319380.0, ans=0.2 2023-11-26 09:11:51,282 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3319446.6666666665, ans=0.125 2023-11-26 09:12:01,646 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 4950, loss[loss=0.06804, simple_loss=0.08988, pruned_loss=0.01456, audio_tagging_loss=0.008538, over 15547.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.08989, pruned_loss=0.0125, audio_tagging_loss=0.008987, over 3038614.52 frames. ], batch size: 60, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:12:24,934 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 497950 2023-11-26 09:12:26,435 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.02 vs. limit=15.0 2023-11-26 09:12:34,520 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3319713.3333333335, ans=0.1 2023-11-26 09:12:38,724 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 09:12:46,485 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.20 vs. limit=22.5 2023-11-26 09:12:46,995 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.442e+01 8.822e+01 9.528e+01 1.003e+02 1.233e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-26 09:12:50,526 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3319780.0, ans=0.125 2023-11-26 09:12:56,702 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 5000, loss[loss=0.08299, simple_loss=0.1158, pruned_loss=0.01782, audio_tagging_loss=0.007269, over 14750.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.0895, pruned_loss=0.01245, audio_tagging_loss=0.00878, over 3036526.58 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:13:21,433 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 498000 2023-11-26 09:13:27,133 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3319980.0, ans=0.125 2023-11-26 09:13:30,878 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.24 vs. limit=15.0 2023-11-26 09:13:33,729 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3320046.6666666665, ans=0.2 2023-11-26 09:13:36,434 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.28 vs. limit=8.0 2023-11-26 09:13:36,909 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3320046.6666666665, ans=0.0 2023-11-26 09:13:44,782 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3320113.3333333335, ans=0.2 2023-11-26 09:13:52,849 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3320180.0, ans=0.1 2023-11-26 09:13:53,748 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 5050, loss[loss=0.07643, simple_loss=0.1055, pruned_loss=0.01662, audio_tagging_loss=0.007037, over 15756.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.09002, pruned_loss=0.01256, audio_tagging_loss=0.008672, over 3033223.13 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:13:53,933 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3320180.0, ans=0.04949747468305833 2023-11-26 09:14:17,106 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 498050 2023-11-26 09:14:36,339 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3320380.0, ans=0.125 2023-11-26 09:14:39,864 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.754e+01 8.596e+01 9.338e+01 9.985e+01 1.178e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-26 09:14:47,317 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.26 vs. limit=15.0 2023-11-26 09:14:49,965 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 5100, loss[loss=0.07502, simple_loss=0.1045, pruned_loss=0.01415, audio_tagging_loss=0.008635, over 15774.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.08971, pruned_loss=0.01246, audio_tagging_loss=0.008669, over 3035268.31 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:14:58,597 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3320513.3333333335, ans=0.125 2023-11-26 09:15:12,787 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 498100 2023-11-26 09:15:12,869 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3320646.6666666665, ans=0.0 2023-11-26 09:15:14,544 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=3320646.6666666665, ans=0.05 2023-11-26 09:15:19,048 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.09 vs. limit=15.0 2023-11-26 09:15:45,336 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 5150, loss[loss=0.08517, simple_loss=0.1161, pruned_loss=0.01967, audio_tagging_loss=0.00745, over 15541.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.09045, pruned_loss=0.01239, audio_tagging_loss=0.008656, over 3038565.49 frames. ], batch size: 61, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:15:48,795 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=11.84 vs. limit=15.0 2023-11-26 09:15:51,448 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3320846.6666666665, ans=0.2 2023-11-26 09:15:52,518 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3320846.6666666665, ans=0.0 2023-11-26 09:16:09,430 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 498150 2023-11-26 09:16:10,603 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3320980.0, ans=0.025 2023-11-26 09:16:31,746 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.763e+01 8.833e+01 9.269e+01 1.033e+02 1.245e+02, threshold=1.854e+02, percent-clipped=0.0 2023-11-26 09:16:41,827 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 5200, loss[loss=0.07112, simple_loss=0.09792, pruned_loss=0.01294, audio_tagging_loss=0.009225, over 15537.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.09036, pruned_loss=0.01241, audio_tagging_loss=0.008685, over 3040750.65 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:16:51,120 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3321180.0, ans=0.2 2023-11-26 09:16:57,477 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3321246.6666666665, ans=0.1 2023-11-26 09:17:04,336 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3321313.3333333335, ans=0.125 2023-11-26 09:17:05,262 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 498200 2023-11-26 09:17:11,114 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3321313.3333333335, ans=0.125 2023-11-26 09:17:22,755 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3321380.0, ans=0.125 2023-11-26 09:17:27,982 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3321446.6666666665, ans=0.1 2023-11-26 09:17:37,327 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.37 vs. limit=15.0 2023-11-26 09:17:37,759 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 5250, loss[loss=0.08628, simple_loss=0.1226, pruned_loss=0.01649, audio_tagging_loss=0.008466, over 15961.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.08965, pruned_loss=0.01248, audio_tagging_loss=0.008751, over 3037503.62 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:17:37,935 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3321513.3333333335, ans=0.125 2023-11-26 09:17:43,822 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3321513.3333333335, ans=0.0 2023-11-26 09:17:46,919 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3321513.3333333335, ans=0.1 2023-11-26 09:17:50,177 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3321580.0, ans=0.05 2023-11-26 09:18:01,139 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 498250 2023-11-26 09:18:20,362 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3321713.3333333335, ans=0.0 2023-11-26 09:18:23,386 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.448e+01 8.788e+01 9.359e+01 1.015e+02 1.795e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-26 09:18:23,733 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3321780.0, ans=0.2 2023-11-26 09:18:26,653 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.13 vs. limit=15.0 2023-11-26 09:18:33,632 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 5300, loss[loss=0.06031, simple_loss=0.07615, pruned_loss=0.01373, audio_tagging_loss=0.008512, over 15542.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08891, pruned_loss=0.01224, audio_tagging_loss=0.008819, over 3036002.05 frames. ], batch size: 61, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:18:57,465 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 498300 2023-11-26 09:19:06,152 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3322046.6666666665, ans=0.125 2023-11-26 09:19:07,207 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3322046.6666666665, ans=0.125 2023-11-26 09:19:29,804 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 5350, loss[loss=0.05587, simple_loss=0.06965, pruned_loss=0.00976, audio_tagging_loss=0.01128, over 16008.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08799, pruned_loss=0.0122, audio_tagging_loss=0.008924, over 3039898.12 frames. ], batch size: 60, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:19:44,307 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3322246.6666666665, ans=0.2 2023-11-26 09:19:48,646 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3322246.6666666665, ans=0.0 2023-11-26 09:19:50,988 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.47 vs. limit=12.0 2023-11-26 09:19:53,285 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 498350 2023-11-26 09:19:57,545 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3322313.3333333335, ans=0.2 2023-11-26 09:19:58,632 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3322313.3333333335, ans=0.125 2023-11-26 09:20:13,475 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3322446.6666666665, ans=0.2 2023-11-26 09:20:16,465 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.299e+01 8.636e+01 9.507e+01 1.021e+02 1.196e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-26 09:20:25,607 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 5400, loss[loss=0.07721, simple_loss=0.09273, pruned_loss=0.01964, audio_tagging_loss=0.01121, over 14946.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.08946, pruned_loss=0.0125, audio_tagging_loss=0.008861, over 3041186.94 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:20:27,118 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.37 vs. limit=15.0 2023-11-26 09:20:32,189 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3322513.3333333335, ans=0.0 2023-11-26 09:20:42,033 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3322580.0, ans=0.015 2023-11-26 09:20:48,973 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 498400 2023-11-26 09:21:10,713 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3322780.0, ans=0.0 2023-11-26 09:21:20,132 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3322846.6666666665, ans=0.2 2023-11-26 09:21:20,966 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 5450, loss[loss=0.07637, simple_loss=0.1118, pruned_loss=0.0142, audio_tagging_loss=0.006249, over 15537.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.08992, pruned_loss=0.01246, audio_tagging_loss=0.008793, over 3046203.31 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:21:27,061 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3322846.6666666665, ans=0.0 2023-11-26 09:21:45,566 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 498450 2023-11-26 09:21:52,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3322980.0, ans=0.0 2023-11-26 09:22:05,428 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3323113.3333333335, ans=0.125 2023-11-26 09:22:08,290 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.192e+01 8.615e+01 9.390e+01 1.017e+02 1.312e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-26 09:22:17,205 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 5500, loss[loss=0.06476, simple_loss=0.08069, pruned_loss=0.0137, audio_tagging_loss=0.01072, over 14655.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.0889, pruned_loss=0.01238, audio_tagging_loss=0.008971, over 3038922.60 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:22:17,810 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.47 vs. limit=22.5 2023-11-26 09:22:37,565 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3323246.6666666665, ans=0.0 2023-11-26 09:22:40,727 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 498500 2023-11-26 09:22:42,940 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3323313.3333333335, ans=0.0 2023-11-26 09:22:52,111 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3323380.0, ans=0.0 2023-11-26 09:22:53,259 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.35 vs. limit=15.0 2023-11-26 09:23:13,564 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 5550, loss[loss=0.05857, simple_loss=0.0783, pruned_loss=0.009909, audio_tagging_loss=0.009513, over 15836.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.08957, pruned_loss=0.01246, audio_tagging_loss=0.008973, over 3035357.56 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:23:13,746 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3323513.3333333335, ans=0.0 2023-11-26 09:23:20,337 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.25 vs. limit=15.0 2023-11-26 09:23:21,223 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3323513.3333333335, ans=0.125 2023-11-26 09:23:26,421 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3323580.0, ans=0.125 2023-11-26 09:23:30,788 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3323580.0, ans=0.125 2023-11-26 09:23:32,730 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3323580.0, ans=0.0 2023-11-26 09:23:36,483 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 498550 2023-11-26 09:23:44,006 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=3323646.6666666665, ans=10.0 2023-11-26 09:23:50,364 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3323713.3333333335, ans=0.125 2023-11-26 09:24:00,244 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.769e+01 8.665e+01 9.160e+01 9.875e+01 1.167e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-26 09:24:00,538 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-26 09:24:04,685 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3323780.0, ans=0.125 2023-11-26 09:24:08,702 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 5600, loss[loss=0.06341, simple_loss=0.08485, pruned_loss=0.01131, audio_tagging_loss=0.009678, over 14662.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.09, pruned_loss=0.01238, audio_tagging_loss=0.009042, over 3044840.22 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:24:13,506 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.68 vs. limit=22.5 2023-11-26 09:24:14,211 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3323846.6666666665, ans=0.125 2023-11-26 09:24:32,121 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 498600 2023-11-26 09:24:47,752 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 09:24:54,196 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3324113.3333333335, ans=10.0 2023-11-26 09:24:54,319 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3324113.3333333335, ans=0.07 2023-11-26 09:25:01,790 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3324113.3333333335, ans=0.2 2023-11-26 09:25:04,650 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 5650, loss[loss=0.04384, simple_loss=0.05161, pruned_loss=0.006849, audio_tagging_loss=0.01118, over 14615.00 frames. ], tot_loss[loss=0.06646, simple_loss=0.09001, pruned_loss=0.01235, audio_tagging_loss=0.009111, over 3045766.47 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:25:08,008 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3324180.0, ans=0.125 2023-11-26 09:25:22,883 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3324246.6666666665, ans=0.125 2023-11-26 09:25:24,921 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3324246.6666666665, ans=0.125 2023-11-26 09:25:27,976 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 498650 2023-11-26 09:25:43,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3324380.0, ans=0.025 2023-11-26 09:25:51,529 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.459e+01 8.711e+01 9.394e+01 1.016e+02 1.261e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-26 09:25:52,732 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3324446.6666666665, ans=0.1 2023-11-26 09:26:00,615 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 5700, loss[loss=0.07225, simple_loss=0.09696, pruned_loss=0.01529, audio_tagging_loss=0.008482, over 15920.00 frames. ], tot_loss[loss=0.06685, simple_loss=0.09067, pruned_loss=0.01247, audio_tagging_loss=0.009041, over 3052607.53 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:26:11,385 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3324580.0, ans=0.035 2023-11-26 09:26:22,931 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 498700 2023-11-26 09:26:31,406 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3324646.6666666665, ans=0.125 2023-11-26 09:26:39,970 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.70 vs. limit=22.5 2023-11-26 09:26:45,148 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3324780.0, ans=0.125 2023-11-26 09:26:55,476 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 5750, loss[loss=0.06931, simple_loss=0.09091, pruned_loss=0.01597, audio_tagging_loss=0.007889, over 15996.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.08931, pruned_loss=0.0123, audio_tagging_loss=0.008941, over 3051933.82 frames. ], batch size: 60, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:27:00,852 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3324846.6666666665, ans=0.0 2023-11-26 09:27:19,397 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 498750 2023-11-26 09:27:37,173 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3325046.6666666665, ans=0.125 2023-11-26 09:27:43,274 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.383e+01 8.688e+01 9.672e+01 1.037e+02 1.412e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-26 09:27:48,806 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3325113.3333333335, ans=0.125 2023-11-26 09:27:51,156 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 5800, loss[loss=0.05877, simple_loss=0.08223, pruned_loss=0.006729, audio_tagging_loss=0.01093, over 16027.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.08948, pruned_loss=0.01236, audio_tagging_loss=0.008881, over 3046048.59 frames. ], batch size: 61, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:27:51,460 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3325180.0, ans=0.2 2023-11-26 09:27:54,439 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3325180.0, ans=0.1 2023-11-26 09:28:08,842 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3325246.6666666665, ans=0.0 2023-11-26 09:28:15,010 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 498800 2023-11-26 09:28:18,544 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3325313.3333333335, ans=0.125 2023-11-26 09:28:21,743 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3325313.3333333335, ans=0.125 2023-11-26 09:28:23,709 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3325380.0, ans=0.0 2023-11-26 09:28:23,827 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3325380.0, ans=0.0 2023-11-26 09:28:25,822 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3325380.0, ans=0.0 2023-11-26 09:28:32,742 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 09:28:37,998 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3325446.6666666665, ans=0.1 2023-11-26 09:28:46,782 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 5850, loss[loss=0.0739, simple_loss=0.09167, pruned_loss=0.01642, audio_tagging_loss=0.01165, over 14846.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.09029, pruned_loss=0.0124, audio_tagging_loss=0.00873, over 3040729.41 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:29:09,681 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 498850 2023-11-26 09:29:16,946 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.82 vs. limit=15.0 2023-11-26 09:29:24,156 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3325713.3333333335, ans=0.125 2023-11-26 09:29:35,027 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.507e+01 8.712e+01 9.460e+01 1.009e+02 5.552e+02, threshold=1.892e+02, percent-clipped=1.0 2023-11-26 09:29:42,428 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 5900, loss[loss=0.06282, simple_loss=0.09242, pruned_loss=0.008621, audio_tagging_loss=0.007987, over 14485.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.09078, pruned_loss=0.01241, audio_tagging_loss=0.008673, over 3036831.60 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:29:42,567 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3325846.6666666665, ans=0.125 2023-11-26 09:29:52,397 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.27 vs. limit=12.0 2023-11-26 09:29:53,222 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3325913.3333333335, ans=0.125 2023-11-26 09:30:02,310 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3325913.3333333335, ans=0.125 2023-11-26 09:30:05,947 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 498900 2023-11-26 09:30:09,860 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3325980.0, ans=0.0 2023-11-26 09:30:13,393 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3325980.0, ans=0.125 2023-11-26 09:30:14,446 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3325980.0, ans=0.1 2023-11-26 09:30:21,973 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3326046.6666666665, ans=0.5 2023-11-26 09:30:27,234 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3326113.3333333335, ans=0.125 2023-11-26 09:30:29,272 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3326113.3333333335, ans=0.125 2023-11-26 09:30:37,534 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 5950, loss[loss=0.06294, simple_loss=0.08492, pruned_loss=0.009575, audio_tagging_loss=0.01091, over 16269.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.09017, pruned_loss=0.0123, audio_tagging_loss=0.008722, over 3040938.62 frames. ], batch size: 60, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:30:46,039 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.01 vs. limit=15.0 2023-11-26 09:30:55,832 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3326246.6666666665, ans=0.125 2023-11-26 09:31:01,974 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 498950 2023-11-26 09:31:02,103 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3326313.3333333335, ans=0.125 2023-11-26 09:31:04,476 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.57 vs. limit=15.0 2023-11-26 09:31:05,308 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3326313.3333333335, ans=0.125 2023-11-26 09:31:23,941 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3326446.6666666665, ans=0.125 2023-11-26 09:31:25,837 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.496e+01 8.795e+01 9.324e+01 1.011e+02 1.404e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-26 09:31:33,848 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 6000, loss[loss=0.06013, simple_loss=0.07968, pruned_loss=0.009107, audio_tagging_loss=0.01119, over 15573.00 frames. ], tot_loss[loss=0.06657, simple_loss=0.09081, pruned_loss=0.01247, audio_tagging_loss=0.008689, over 3044186.85 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:31:33,851 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-26 09:31:55,111 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.8156, 4.9664, 5.1027, 4.8958], device='cuda:0') 2023-11-26 09:31:57,317 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.8019, 5.8313, 5.8878, 5.8169], device='cuda:0') 2023-11-26 09:32:01,838 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.8066, 5.8531, 5.8964, 5.8793], device='cuda:0') 2023-11-26 09:32:06,559 INFO [train_asr.py:1267] (0/4) Epoch 42, validation: loss=0.05807, simple_loss=0.05064, pruned_loss=0.005286, audio_tagging_loss=0.02746, over 4681554.00 frames. 2023-11-26 09:32:06,559 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-26 09:32:19,967 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3326580.0, ans=0.0 2023-11-26 09:32:29,927 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 499000 2023-11-26 09:32:45,650 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 09:32:46,798 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3326713.3333333335, ans=0.1 2023-11-26 09:32:52,238 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3326780.0, ans=0.1 2023-11-26 09:33:01,924 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 6050, loss[loss=0.07082, simple_loss=0.09664, pruned_loss=0.01548, audio_tagging_loss=0.007024, over 15226.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.08944, pruned_loss=0.0124, audio_tagging_loss=0.008799, over 3041285.78 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:33:06,491 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3326846.6666666665, ans=0.125 2023-11-26 09:33:10,203 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 09:33:17,618 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3326913.3333333335, ans=0.2 2023-11-26 09:33:25,938 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 499050 2023-11-26 09:33:31,446 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3326980.0, ans=0.125 2023-11-26 09:33:34,091 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.83 vs. limit=6.0 2023-11-26 09:33:41,055 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3327046.6666666665, ans=0.125 2023-11-26 09:33:44,695 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3327046.6666666665, ans=0.125 2023-11-26 09:33:49,741 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.079e+01 8.694e+01 9.426e+01 1.019e+02 1.507e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-26 09:33:58,294 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 6100, loss[loss=0.07788, simple_loss=0.1182, pruned_loss=0.0116, audio_tagging_loss=0.007174, over 16977.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.08994, pruned_loss=0.01235, audio_tagging_loss=0.008724, over 3042912.68 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:34:01,549 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.15 vs. limit=22.5 2023-11-26 09:34:08,692 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3327246.6666666665, ans=0.125 2023-11-26 09:34:21,294 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 499100 2023-11-26 09:34:54,322 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 6150, loss[loss=0.06716, simple_loss=0.09427, pruned_loss=0.01453, audio_tagging_loss=0.005502, over 14457.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.08968, pruned_loss=0.01237, audio_tagging_loss=0.008775, over 3044458.74 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:35:05,036 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3327580.0, ans=0.0 2023-11-26 09:35:07,846 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3327580.0, ans=0.0 2023-11-26 09:35:17,632 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 499150 2023-11-26 09:35:23,793 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.71 vs. limit=15.0 2023-11-26 09:35:39,092 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3327780.0, ans=0.04949747468305833 2023-11-26 09:35:40,022 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3327780.0, ans=0.125 2023-11-26 09:35:42,931 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.401e+01 8.630e+01 9.202e+01 1.002e+02 1.257e+02, threshold=1.840e+02, percent-clipped=0.0 2023-11-26 09:35:48,495 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3327846.6666666665, ans=0.0 2023-11-26 09:35:49,821 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 6200, loss[loss=0.08633, simple_loss=0.1226, pruned_loss=0.01899, audio_tagging_loss=0.006034, over 15996.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.08953, pruned_loss=0.01243, audio_tagging_loss=0.008842, over 3042692.18 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:36:07,700 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.67 vs. limit=10.0 2023-11-26 09:36:08,206 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3327913.3333333335, ans=0.125 2023-11-26 09:36:09,156 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3327913.3333333335, ans=0.0 2023-11-26 09:36:13,307 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 499200 2023-11-26 09:36:30,861 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.77 vs. limit=22.5 2023-11-26 09:36:34,412 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.55 vs. limit=15.0 2023-11-26 09:36:42,018 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3328113.3333333335, ans=0.125 2023-11-26 09:36:46,677 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 6250, loss[loss=0.0665, simple_loss=0.08934, pruned_loss=0.0125, audio_tagging_loss=0.009331, over 15899.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.08976, pruned_loss=0.01242, audio_tagging_loss=0.008958, over 3053761.04 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:37:09,553 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 499250 2023-11-26 09:37:35,669 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3328446.6666666665, ans=0.125 2023-11-26 09:37:36,549 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.201e+01 8.808e+01 9.356e+01 1.010e+02 1.714e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-26 09:37:42,474 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 6300, loss[loss=0.06682, simple_loss=0.08953, pruned_loss=0.01299, audio_tagging_loss=0.009058, over 15224.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.08931, pruned_loss=0.01234, audio_tagging_loss=0.009074, over 3053884.38 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 8.0 2023-11-26 09:37:55,328 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3328580.0, ans=0.125 2023-11-26 09:38:05,846 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 499300 2023-11-26 09:38:06,084 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3328646.6666666665, ans=0.07 2023-11-26 09:38:07,121 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3328646.6666666665, ans=0.1 2023-11-26 09:38:11,755 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3328646.6666666665, ans=0.0 2023-11-26 09:38:19,127 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3328713.3333333335, ans=0.2 2023-11-26 09:38:34,427 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3328780.0, ans=0.0 2023-11-26 09:38:37,416 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 6350, loss[loss=0.06858, simple_loss=0.09873, pruned_loss=0.01027, audio_tagging_loss=0.008949, over 15985.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.08975, pruned_loss=0.01229, audio_tagging_loss=0.009091, over 3048412.13 frames. ], batch size: 63, lr: 1.61e-03, grad_scale: 8.0 2023-11-26 09:38:46,850 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.62 vs. limit=15.0 2023-11-26 09:38:53,345 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3328913.3333333335, ans=0.1 2023-11-26 09:39:01,253 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 499350 2023-11-26 09:39:19,011 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3329046.6666666665, ans=0.2 2023-11-26 09:39:27,766 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.655e+01 8.815e+01 9.462e+01 1.012e+02 1.581e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-26 09:39:28,430 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.77 vs. limit=15.0 2023-11-26 09:39:29,215 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.83 vs. limit=10.0 2023-11-26 09:39:33,588 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 6400, loss[loss=0.06181, simple_loss=0.09025, pruned_loss=0.008977, audio_tagging_loss=0.007707, over 14586.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.08997, pruned_loss=0.01229, audio_tagging_loss=0.009089, over 3038636.17 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:39:42,710 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3329180.0, ans=0.125 2023-11-26 09:39:56,853 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 499400 2023-11-26 09:39:58,411 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3329313.3333333335, ans=0.1 2023-11-26 09:40:02,708 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3329313.3333333335, ans=0.0 2023-11-26 09:40:08,347 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3329380.0, ans=0.125 2023-11-26 09:40:27,254 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3329446.6666666665, ans=0.125 2023-11-26 09:40:29,272 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 6450, loss[loss=0.05739, simple_loss=0.07845, pruned_loss=0.008507, audio_tagging_loss=0.009658, over 14249.00 frames. ], tot_loss[loss=0.06719, simple_loss=0.09123, pruned_loss=0.01243, audio_tagging_loss=0.009148, over 3036059.02 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:40:40,724 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3329580.0, ans=0.125 2023-11-26 09:40:47,756 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3329580.0, ans=0.125 2023-11-26 09:40:52,735 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 499450 2023-11-26 09:40:52,904 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3329646.6666666665, ans=0.1 2023-11-26 09:40:58,114 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.77 vs. limit=15.0 2023-11-26 09:41:19,119 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.267e+01 8.730e+01 9.364e+01 9.937e+01 1.364e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-26 09:41:25,100 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 6500, loss[loss=0.04659, simple_loss=0.05977, pruned_loss=0.005685, audio_tagging_loss=0.01102, over 14526.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.09006, pruned_loss=0.01226, audio_tagging_loss=0.00923, over 3032115.42 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:41:38,435 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.83 vs. limit=10.0 2023-11-26 09:41:48,809 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 499500 2023-11-26 09:42:20,858 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 6550, loss[loss=0.06208, simple_loss=0.08538, pruned_loss=0.009616, audio_tagging_loss=0.00977, over 15584.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.08981, pruned_loss=0.01215, audio_tagging_loss=0.00896, over 3033439.54 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:42:27,791 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3330180.0, ans=0.125 2023-11-26 09:42:36,463 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3330246.6666666665, ans=0.125 2023-11-26 09:42:41,144 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.15 vs. limit=15.0 2023-11-26 09:42:43,925 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 499550 2023-11-26 09:42:49,766 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_na.min_abs, batch_count=3330313.3333333335, ans=0.02 2023-11-26 09:42:54,618 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3330380.0, ans=0.125 2023-11-26 09:42:58,943 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3330380.0, ans=0.0 2023-11-26 09:43:02,040 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3330380.0, ans=0.0 2023-11-26 09:43:09,659 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.27 vs. limit=12.0 2023-11-26 09:43:11,435 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.554e+01 8.534e+01 9.204e+01 9.909e+01 1.481e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-26 09:43:15,965 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3330513.3333333335, ans=0.2 2023-11-26 09:43:16,671 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 6600, loss[loss=0.07916, simple_loss=0.1084, pruned_loss=0.01779, audio_tagging_loss=0.007176, over 14825.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.08986, pruned_loss=0.0123, audio_tagging_loss=0.008872, over 3036544.39 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:43:17,294 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.86 vs. limit=10.0 2023-11-26 09:43:35,276 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3330580.0, ans=0.0 2023-11-26 09:43:39,884 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 499600 2023-11-26 09:43:48,279 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3330646.6666666665, ans=0.125 2023-11-26 09:43:51,943 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3330713.3333333335, ans=0.125 2023-11-26 09:43:59,426 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3330713.3333333335, ans=0.125 2023-11-26 09:44:08,226 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.00 vs. limit=15.0 2023-11-26 09:44:11,840 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 6650, loss[loss=0.0822, simple_loss=0.1134, pruned_loss=0.0174, audio_tagging_loss=0.008119, over 15568.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.08995, pruned_loss=0.01229, audio_tagging_loss=0.008843, over 3037816.92 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:44:20,685 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3330846.6666666665, ans=0.0 2023-11-26 09:44:36,717 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 499650 2023-11-26 09:45:02,563 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.147e+01 8.692e+01 9.274e+01 1.004e+02 1.194e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-26 09:45:07,937 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 6700, loss[loss=0.08186, simple_loss=0.1069, pruned_loss=0.01892, audio_tagging_loss=0.009461, over 14923.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.0904, pruned_loss=0.01253, audio_tagging_loss=0.008747, over 3038833.63 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:45:09,599 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.52 vs. limit=22.5 2023-11-26 09:45:10,864 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3331180.0, ans=0.0 2023-11-26 09:45:20,925 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3331246.6666666665, ans=0.1 2023-11-26 09:45:22,936 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3331246.6666666665, ans=0.0 2023-11-26 09:45:29,745 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.45 vs. limit=15.0 2023-11-26 09:45:31,205 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 499700 2023-11-26 09:45:39,256 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.80 vs. limit=15.0 2023-11-26 09:45:41,293 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.81 vs. limit=15.0 2023-11-26 09:45:45,546 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.18 vs. limit=6.0 2023-11-26 09:45:50,456 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3331380.0, ans=0.125 2023-11-26 09:45:54,545 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.30 vs. limit=10.0 2023-11-26 09:45:55,221 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3331446.6666666665, ans=0.125 2023-11-26 09:46:04,040 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 6750, loss[loss=0.0519, simple_loss=0.07137, pruned_loss=0.008126, audio_tagging_loss=0.008083, over 16068.00 frames. ], tot_loss[loss=0.06653, simple_loss=0.09016, pruned_loss=0.01265, audio_tagging_loss=0.0088, over 3044569.60 frames. ], batch size: 60, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:46:26,955 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 499750 2023-11-26 09:46:35,627 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3331646.6666666665, ans=0.2 2023-11-26 09:46:43,611 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3331713.3333333335, ans=0.04949747468305833 2023-11-26 09:46:53,938 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.511e+01 8.795e+01 9.376e+01 1.027e+02 1.751e+02, threshold=1.875e+02, percent-clipped=0.0 2023-11-26 09:46:59,202 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 6800, loss[loss=0.05688, simple_loss=0.07855, pruned_loss=0.01018, audio_tagging_loss=0.007425, over 14277.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.0902, pruned_loss=0.01254, audio_tagging_loss=0.008708, over 3036279.78 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:47:01,644 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3331846.6666666665, ans=0.1 2023-11-26 09:47:04,696 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3331846.6666666665, ans=0.1 2023-11-26 09:47:14,088 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3331913.3333333335, ans=0.125 2023-11-26 09:47:14,147 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3331913.3333333335, ans=0.2 2023-11-26 09:47:23,604 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 499800 2023-11-26 09:47:37,778 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3332046.6666666665, ans=0.125 2023-11-26 09:47:55,090 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 6850, loss[loss=0.09401, simple_loss=0.1263, pruned_loss=0.02517, audio_tagging_loss=0.005665, over 14025.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.0909, pruned_loss=0.01261, audio_tagging_loss=0.008666, over 3033461.65 frames. ], batch size: 53, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:48:06,461 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3332246.6666666665, ans=0.2 2023-11-26 09:48:18,876 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 499850 2023-11-26 09:48:24,233 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3332313.3333333335, ans=0.125 2023-11-26 09:48:25,953 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.96 vs. limit=15.0 2023-11-26 09:48:29,581 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3332380.0, ans=0.07 2023-11-26 09:48:29,685 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3332380.0, ans=0.04949747468305833 2023-11-26 09:48:39,185 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.88 vs. limit=22.5 2023-11-26 09:48:43,520 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3332446.6666666665, ans=0.0 2023-11-26 09:48:45,308 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.499e+01 8.735e+01 9.366e+01 1.004e+02 1.286e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-26 09:48:46,523 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3332446.6666666665, ans=0.1 2023-11-26 09:48:48,085 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 09:48:51,079 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 6900, loss[loss=0.05401, simple_loss=0.06523, pruned_loss=0.01122, audio_tagging_loss=0.01017, over 15106.00 frames. ], tot_loss[loss=0.06668, simple_loss=0.09104, pruned_loss=0.01253, audio_tagging_loss=0.008632, over 3037474.49 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:48:51,247 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3332513.3333333335, ans=0.125 2023-11-26 09:48:52,439 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3332513.3333333335, ans=0.125 2023-11-26 09:49:13,453 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 499900 2023-11-26 09:49:17,337 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3332646.6666666665, ans=0.0 2023-11-26 09:49:30,115 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3332713.3333333335, ans=0.1 2023-11-26 09:49:32,216 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3332713.3333333335, ans=0.125 2023-11-26 09:49:33,064 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 09:49:45,790 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 6950, loss[loss=0.0669, simple_loss=0.09445, pruned_loss=0.01389, audio_tagging_loss=0.005792, over 13886.00 frames. ], tot_loss[loss=0.0668, simple_loss=0.09134, pruned_loss=0.01253, audio_tagging_loss=0.008592, over 3035322.20 frames. ], batch size: 53, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:49:46,100 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3332846.6666666665, ans=0.0 2023-11-26 09:49:56,642 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3332913.3333333335, ans=0.125 2023-11-26 09:50:03,503 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3332913.3333333335, ans=0.125 2023-11-26 09:50:09,627 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 499950 2023-11-26 09:50:09,630 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3332980.0, ans=0.015 2023-11-26 09:50:17,752 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3332980.0, ans=0.0 2023-11-26 09:50:25,817 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.27 vs. limit=22.5 2023-11-26 09:50:35,561 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.592e+01 8.609e+01 9.217e+01 1.000e+02 1.228e+02, threshold=1.843e+02, percent-clipped=0.0 2023-11-26 09:50:41,407 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 7000, loss[loss=0.08263, simple_loss=0.1176, pruned_loss=0.01798, audio_tagging_loss=0.005854, over 15532.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.09061, pruned_loss=0.0126, audio_tagging_loss=0.008702, over 3034202.52 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:50:58,153 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3333246.6666666665, ans=0.2 2023-11-26 09:51:02,521 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3333246.6666666665, ans=0.125 2023-11-26 09:51:05,436 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 500000 2023-11-26 09:51:06,769 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-500000.pt 2023-11-26 09:51:12,015 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3333313.3333333335, ans=0.125 2023-11-26 09:51:12,086 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3333313.3333333335, ans=0.125 2023-11-26 09:51:18,264 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3333380.0, ans=0.1 2023-11-26 09:51:19,436 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3333380.0, ans=0.0 2023-11-26 09:51:32,740 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.39 vs. limit=15.0 2023-11-26 09:51:40,246 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 7050, loss[loss=0.05239, simple_loss=0.06872, pruned_loss=0.009836, audio_tagging_loss=0.008197, over 14734.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.08965, pruned_loss=0.01251, audio_tagging_loss=0.008828, over 3030814.12 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:51:44,715 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3333513.3333333335, ans=0.125 2023-11-26 09:51:47,875 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3333513.3333333335, ans=0.0 2023-11-26 09:52:02,572 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 500050 2023-11-26 09:52:03,853 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3333646.6666666665, ans=0.125 2023-11-26 09:52:25,041 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3333780.0, ans=0.125 2023-11-26 09:52:26,180 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3333780.0, ans=0.1 2023-11-26 09:52:31,145 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.330e+01 8.700e+01 9.418e+01 1.001e+02 1.210e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-26 09:52:35,494 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 7100, loss[loss=0.07513, simple_loss=0.1094, pruned_loss=0.01414, audio_tagging_loss=0.006285, over 15106.00 frames. ], tot_loss[loss=0.06694, simple_loss=0.09073, pruned_loss=0.01263, audio_tagging_loss=0.008941, over 3035172.35 frames. ], batch size: 53, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:52:36,774 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3333846.6666666665, ans=0.125 2023-11-26 09:52:37,911 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3333846.6666666665, ans=0.125 2023-11-26 09:52:40,060 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3333846.6666666665, ans=0.0 2023-11-26 09:52:41,069 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3333846.6666666665, ans=0.125 2023-11-26 09:52:43,226 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3333846.6666666665, ans=0.1 2023-11-26 09:52:47,471 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3333913.3333333335, ans=0.125 2023-11-26 09:52:58,309 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 500100 2023-11-26 09:53:16,884 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.37 vs. limit=15.0 2023-11-26 09:53:28,687 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.22 vs. limit=6.0 2023-11-26 09:53:30,080 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 7150, loss[loss=0.06745, simple_loss=0.09096, pruned_loss=0.01317, audio_tagging_loss=0.008804, over 15624.00 frames. ], tot_loss[loss=0.06769, simple_loss=0.09208, pruned_loss=0.01278, audio_tagging_loss=0.008879, over 3045818.41 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:53:35,583 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.20 vs. limit=15.0 2023-11-26 09:53:54,574 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 500150 2023-11-26 09:54:03,314 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3334380.0, ans=0.125 2023-11-26 09:54:11,632 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3334380.0, ans=0.125 2023-11-26 09:54:18,508 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3334446.6666666665, ans=0.125 2023-11-26 09:54:18,623 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3334446.6666666665, ans=0.125 2023-11-26 09:54:19,547 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 09:54:21,462 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.118e+01 8.683e+01 9.180e+01 1.005e+02 1.351e+02, threshold=1.836e+02, percent-clipped=0.0 2023-11-26 09:54:21,711 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3334446.6666666665, ans=0.125 2023-11-26 09:54:26,328 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 7200, loss[loss=0.06775, simple_loss=0.09353, pruned_loss=0.01117, audio_tagging_loss=0.009823, over 15938.00 frames. ], tot_loss[loss=0.0673, simple_loss=0.09132, pruned_loss=0.01265, audio_tagging_loss=0.008994, over 3043233.26 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:54:29,203 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3334513.3333333335, ans=0.125 2023-11-26 09:54:39,714 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.18 vs. limit=15.0 2023-11-26 09:54:41,308 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3334580.0, ans=0.1 2023-11-26 09:54:47,969 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.87 vs. limit=12.0 2023-11-26 09:54:49,700 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 500200 2023-11-26 09:54:50,287 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.52 vs. limit=15.0 2023-11-26 09:54:51,521 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.20 vs. limit=22.5 2023-11-26 09:54:57,067 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3334646.6666666665, ans=0.125 2023-11-26 09:55:10,503 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=3334780.0, ans=15.0 2023-11-26 09:55:22,866 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 7250, loss[loss=0.0603, simple_loss=0.07709, pruned_loss=0.01161, audio_tagging_loss=0.01015, over 14900.00 frames. ], tot_loss[loss=0.06722, simple_loss=0.09123, pruned_loss=0.01266, audio_tagging_loss=0.008941, over 3044440.18 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:55:45,981 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 500250 2023-11-26 09:56:03,855 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.70 vs. limit=12.0 2023-11-26 09:56:06,965 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3335113.3333333335, ans=0.0 2023-11-26 09:56:15,146 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.837e+01 8.852e+01 9.361e+01 1.013e+02 1.372e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-26 09:56:18,397 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 7300, loss[loss=0.05453, simple_loss=0.07525, pruned_loss=0.008029, audio_tagging_loss=0.00888, over 15691.00 frames. ], tot_loss[loss=0.06668, simple_loss=0.09076, pruned_loss=0.01246, audio_tagging_loss=0.008849, over 3039573.80 frames. ], batch size: 62, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:56:19,096 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.13 vs. limit=15.0 2023-11-26 09:56:24,783 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.99 vs. limit=22.5 2023-11-26 09:56:34,389 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3335246.6666666665, ans=0.05 2023-11-26 09:56:38,638 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.51 vs. limit=12.0 2023-11-26 09:56:42,133 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 500300 2023-11-26 09:56:44,912 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3335313.3333333335, ans=0.0 2023-11-26 09:56:45,297 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.64 vs. limit=6.0 2023-11-26 09:56:53,342 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=3335380.0, ans=0.05 2023-11-26 09:56:53,489 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3335380.0, ans=0.125 2023-11-26 09:56:56,636 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3335380.0, ans=0.0 2023-11-26 09:56:59,147 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.96 vs. limit=15.0 2023-11-26 09:57:02,919 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3335446.6666666665, ans=0.125 2023-11-26 09:57:04,074 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.31 vs. limit=22.5 2023-11-26 09:57:07,574 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3335446.6666666665, ans=0.1 2023-11-26 09:57:08,637 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3335446.6666666665, ans=0.1 2023-11-26 09:57:14,269 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 7350, loss[loss=0.04295, simple_loss=0.05267, pruned_loss=0.008269, audio_tagging_loss=0.008352, over 15736.00 frames. ], tot_loss[loss=0.06678, simple_loss=0.091, pruned_loss=0.01256, audio_tagging_loss=0.008719, over 3039737.63 frames. ], batch size: 62, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:57:19,145 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3335513.3333333335, ans=0.0 2023-11-26 09:57:37,771 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 500350 2023-11-26 09:57:39,970 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 09:57:43,356 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=3335646.6666666665, ans=22.5 2023-11-26 09:57:51,097 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3335713.3333333335, ans=0.2 2023-11-26 09:57:54,340 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3335713.3333333335, ans=0.125 2023-11-26 09:57:56,927 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3335713.3333333335, ans=0.125 2023-11-26 09:57:56,943 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3335713.3333333335, ans=0.125 2023-11-26 09:58:06,799 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.208e+01 8.616e+01 9.240e+01 1.003e+02 1.313e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-26 09:58:09,961 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 7400, loss[loss=0.0684, simple_loss=0.09197, pruned_loss=0.01253, audio_tagging_loss=0.009893, over 14980.00 frames. ], tot_loss[loss=0.06662, simple_loss=0.09088, pruned_loss=0.01246, audio_tagging_loss=0.008717, over 3048536.32 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:58:10,199 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_na.min_abs, batch_count=3335846.6666666665, ans=0.02 2023-11-26 09:58:23,550 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3335913.3333333335, ans=0.1 2023-11-26 09:58:31,439 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3335980.0, ans=0.0 2023-11-26 09:58:33,285 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 500400 2023-11-26 09:59:05,895 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 7450, loss[loss=0.05724, simple_loss=0.08145, pruned_loss=0.008966, audio_tagging_loss=0.007549, over 15642.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.09037, pruned_loss=0.01238, audio_tagging_loss=0.008713, over 3053262.85 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:59:07,229 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3336180.0, ans=0.04949747468305833 2023-11-26 09:59:29,772 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 500450 2023-11-26 09:59:58,272 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.569e+01 8.877e+01 9.404e+01 1.015e+02 1.370e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-26 10:00:01,515 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 7500, loss[loss=0.07821, simple_loss=0.1036, pruned_loss=0.01479, audio_tagging_loss=0.01162, over 14467.00 frames. ], tot_loss[loss=0.0664, simple_loss=0.09045, pruned_loss=0.01249, audio_tagging_loss=0.008686, over 3049792.39 frames. ], batch size: 53, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 10:00:14,958 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.39 vs. limit=8.0 2023-11-26 10:00:25,307 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 500500 2023-11-26 10:00:57,058 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.13 vs. limit=15.0 2023-11-26 10:00:57,572 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 7550, loss[loss=0.07764, simple_loss=0.1018, pruned_loss=0.01571, audio_tagging_loss=0.011, over 14319.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.09017, pruned_loss=0.01245, audio_tagging_loss=0.008632, over 3047371.85 frames. ], batch size: 53, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 10:00:57,728 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3336846.6666666665, ans=0.0 2023-11-26 10:01:02,071 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3336846.6666666665, ans=0.125 2023-11-26 10:01:21,055 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 500550 2023-11-26 10:01:23,388 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3336980.0, ans=0.1 2023-11-26 10:01:38,220 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3337046.6666666665, ans=0.125 2023-11-26 10:01:49,615 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.075e+01 8.478e+01 8.999e+01 9.554e+01 1.187e+02, threshold=1.800e+02, percent-clipped=0.0 2023-11-26 10:01:51,371 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3337113.3333333335, ans=0.125 2023-11-26 10:01:53,388 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 7600, loss[loss=0.06245, simple_loss=0.08158, pruned_loss=0.01009, audio_tagging_loss=0.01157, over 15207.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.0906, pruned_loss=0.01274, audio_tagging_loss=0.00857, over 3047718.22 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 10:02:05,176 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.74 vs. limit=15.0 2023-11-26 10:02:17,254 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 500600 2023-11-26 10:02:20,797 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3337313.3333333335, ans=0.125 2023-11-26 10:02:34,139 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3337380.0, ans=0.1 2023-11-26 10:02:49,122 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 7650, loss[loss=0.07076, simple_loss=0.09646, pruned_loss=0.01338, audio_tagging_loss=0.009146, over 16164.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.0891, pruned_loss=0.01247, audio_tagging_loss=0.008687, over 3040151.16 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 10:02:51,434 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3337513.3333333335, ans=0.125 2023-11-26 10:03:09,556 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3337580.0, ans=0.125 2023-11-26 10:03:12,804 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 500650 2023-11-26 10:03:35,405 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3337780.0, ans=0.125 2023-11-26 10:03:41,954 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.661e+01 8.670e+01 9.156e+01 1.010e+02 1.245e+02, threshold=1.831e+02, percent-clipped=0.0 2023-11-26 10:03:45,679 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 7700, loss[loss=0.07619, simple_loss=0.1096, pruned_loss=0.01532, audio_tagging_loss=0.006083, over 14972.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.08902, pruned_loss=0.01227, audio_tagging_loss=0.008656, over 3040061.57 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 10:03:45,819 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3337846.6666666665, ans=0.125 2023-11-26 10:03:48,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3337846.6666666665, ans=0.125 2023-11-26 10:04:08,441 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 500700 2023-11-26 10:04:14,686 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.33 vs. limit=15.0 2023-11-26 10:04:19,099 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3338046.6666666665, ans=0.125 2023-11-26 10:04:34,994 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.06 vs. limit=22.5 2023-11-26 10:04:39,753 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3338180.0, ans=0.0 2023-11-26 10:04:40,565 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 7750, loss[loss=0.06734, simple_loss=0.09469, pruned_loss=0.01282, audio_tagging_loss=0.007173, over 15517.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.08941, pruned_loss=0.01236, audio_tagging_loss=0.008685, over 3032897.71 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 10:04:45,597 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3338180.0, ans=0.0 2023-11-26 10:04:48,685 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3338180.0, ans=0.0 2023-11-26 10:04:53,805 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.92 vs. limit=15.0 2023-11-26 10:05:04,473 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 500750 2023-11-26 10:05:05,812 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3338313.3333333335, ans=0.1 2023-11-26 10:05:17,955 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3338380.0, ans=0.125 2023-11-26 10:05:20,081 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3338380.0, ans=0.0 2023-11-26 10:05:32,990 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.118e+01 9.045e+01 9.530e+01 1.049e+02 1.522e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-26 10:05:36,763 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 7800, loss[loss=0.08297, simple_loss=0.1203, pruned_loss=0.01444, audio_tagging_loss=0.008364, over 15461.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.09003, pruned_loss=0.01255, audio_tagging_loss=0.008688, over 3038988.13 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 10:05:53,916 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3338580.0, ans=0.125 2023-11-26 10:06:00,207 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 500800 2023-11-26 10:06:11,321 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3338713.3333333335, ans=0.0 2023-11-26 10:06:32,801 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 7850, loss[loss=0.05402, simple_loss=0.0693, pruned_loss=0.007953, audio_tagging_loss=0.01142, over 15027.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.09034, pruned_loss=0.01251, audio_tagging_loss=0.0087, over 3040361.02 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 10:06:53,547 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3338980.0, ans=0.125 2023-11-26 10:06:55,621 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 500850 2023-11-26 10:07:19,391 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.36 vs. limit=6.0 2023-11-26 10:07:25,272 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.232e+01 8.991e+01 9.385e+01 9.905e+01 1.223e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-26 10:07:26,665 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3339113.3333333335, ans=0.125 2023-11-26 10:07:28,430 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 7900, loss[loss=0.04413, simple_loss=0.05852, pruned_loss=0.007178, audio_tagging_loss=0.007691, over 14246.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.09049, pruned_loss=0.01234, audio_tagging_loss=0.008759, over 3046846.07 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 10:07:42,117 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3339246.6666666665, ans=0.125 2023-11-26 10:07:52,006 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 500900 2023-11-26 10:07:59,083 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3339313.3333333335, ans=0.125 2023-11-26 10:08:22,164 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3339513.3333333335, ans=0.125 2023-11-26 10:08:23,021 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 7950, loss[loss=0.06537, simple_loss=0.07981, pruned_loss=0.01341, audio_tagging_loss=0.01205, over 14966.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.08924, pruned_loss=0.01215, audio_tagging_loss=0.00884, over 3043101.99 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 10:08:38,282 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 10:08:40,912 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.39 vs. limit=22.5 2023-11-26 10:08:42,071 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3339580.0, ans=0.125 2023-11-26 10:08:47,340 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 500950 2023-11-26 10:08:47,560 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3339646.6666666665, ans=0.0 2023-11-26 10:08:57,953 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3339713.3333333335, ans=0.1 2023-11-26 10:09:15,820 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.552e+01 8.923e+01 9.464e+01 1.020e+02 1.321e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-26 10:09:18,709 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3339846.6666666665, ans=0.025 2023-11-26 10:09:19,575 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 8000, loss[loss=0.06479, simple_loss=0.09264, pruned_loss=0.01177, audio_tagging_loss=0.006703, over 15173.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08885, pruned_loss=0.01207, audio_tagging_loss=0.008935, over 3042474.91 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 10:09:20,805 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3339846.6666666665, ans=0.2 2023-11-26 10:09:39,713 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.91 vs. limit=22.5 2023-11-26 10:09:42,393 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 501000 2023-11-26 10:09:46,038 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3339980.0, ans=0.0 2023-11-26 10:09:54,438 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3340046.6666666665, ans=0.125 2023-11-26 10:10:12,483 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3340113.3333333335, ans=0.0 2023-11-26 10:10:15,405 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 8050, loss[loss=0.04356, simple_loss=0.04826, pruned_loss=0.004562, audio_tagging_loss=0.01487, over 14469.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08839, pruned_loss=0.01212, audio_tagging_loss=0.009048, over 3044352.33 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 10:10:21,929 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3340180.0, ans=0.95 2023-11-26 10:10:30,328 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3340246.6666666665, ans=0.0 2023-11-26 10:10:38,633 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 501050 2023-11-26 10:10:42,450 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3340313.3333333335, ans=0.0 2023-11-26 10:10:43,619 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3340313.3333333335, ans=0.125 2023-11-26 10:10:51,583 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3340380.0, ans=0.125 2023-11-26 10:10:59,551 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.61 vs. limit=15.0 2023-11-26 10:11:01,302 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3340446.6666666665, ans=0.1 2023-11-26 10:11:07,297 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.954e+01 8.742e+01 9.437e+01 9.965e+01 1.262e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-26 10:11:10,521 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 8100, loss[loss=0.06114, simple_loss=0.08171, pruned_loss=0.009191, audio_tagging_loss=0.0111, over 14790.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08803, pruned_loss=0.01209, audio_tagging_loss=0.009015, over 3043750.44 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 10:11:29,763 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.49 vs. limit=15.0 2023-11-26 10:11:35,130 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 501100 2023-11-26 10:11:42,692 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3340646.6666666665, ans=0.125 2023-11-26 10:11:47,886 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3340713.3333333335, ans=0.125 2023-11-26 10:12:07,297 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 8150, loss[loss=0.07458, simple_loss=0.1087, pruned_loss=0.01528, audio_tagging_loss=0.004965, over 15697.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08887, pruned_loss=0.01229, audio_tagging_loss=0.008939, over 3040678.27 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 10:12:30,305 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 501150 2023-11-26 10:12:33,535 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3340980.0, ans=0.0 2023-11-26 10:12:35,018 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.93 vs. limit=22.5 2023-11-26 10:12:40,594 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3341046.6666666665, ans=0.0 2023-11-26 10:12:40,925 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.51 vs. limit=22.5 2023-11-26 10:12:46,945 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3341046.6666666665, ans=0.125 2023-11-26 10:12:51,603 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3341113.3333333335, ans=0.125 2023-11-26 10:12:54,691 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3341113.3333333335, ans=0.125 2023-11-26 10:13:00,389 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.443e+01 8.798e+01 9.302e+01 1.005e+02 1.230e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-26 10:13:02,550 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 8200, loss[loss=0.06489, simple_loss=0.09759, pruned_loss=0.008661, audio_tagging_loss=0.007428, over 15063.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08877, pruned_loss=0.01219, audio_tagging_loss=0.008766, over 3049338.32 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 10:13:03,704 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 10:13:25,288 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 501200 2023-11-26 10:13:57,335 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 8250, loss[loss=0.06504, simple_loss=0.09296, pruned_loss=0.01072, audio_tagging_loss=0.007842, over 15856.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08925, pruned_loss=0.01223, audio_tagging_loss=0.008664, over 3050030.05 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 10:13:57,469 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3341513.3333333335, ans=0.125 2023-11-26 10:14:04,834 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.77 vs. limit=10.0 2023-11-26 10:14:06,560 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3341513.3333333335, ans=0.0 2023-11-26 10:14:21,700 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 501250 2023-11-26 10:14:38,662 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3341713.3333333335, ans=0.125 2023-11-26 10:14:40,852 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3341780.0, ans=0.125 2023-11-26 10:14:40,902 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 10:14:50,641 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.634e+01 8.840e+01 9.471e+01 1.008e+02 1.505e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-26 10:14:52,773 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 8300, loss[loss=0.06863, simple_loss=0.09793, pruned_loss=0.014, audio_tagging_loss=0.005664, over 15454.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.08938, pruned_loss=0.01222, audio_tagging_loss=0.008644, over 3055879.49 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 10:14:56,646 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3341846.6666666665, ans=0.1 2023-11-26 10:14:57,545 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.41 vs. limit=12.0 2023-11-26 10:15:09,666 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.73 vs. limit=15.0 2023-11-26 10:15:16,623 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 501300 2023-11-26 10:15:22,075 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3341980.0, ans=0.1 2023-11-26 10:15:49,213 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 8350, loss[loss=0.0405, simple_loss=0.05126, pruned_loss=0.00493, audio_tagging_loss=0.009939, over 15297.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08936, pruned_loss=0.01222, audio_tagging_loss=0.008629, over 3054066.88 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 10:16:11,941 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 501350 2023-11-26 10:16:27,491 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3342380.0, ans=0.0 2023-11-26 10:16:34,883 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3342446.6666666665, ans=0.1 2023-11-26 10:16:42,099 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.412e+01 8.918e+01 9.503e+01 1.018e+02 1.589e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-26 10:16:44,296 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 8400, loss[loss=0.07048, simple_loss=0.09019, pruned_loss=0.01568, audio_tagging_loss=0.009704, over 15111.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08861, pruned_loss=0.0122, audio_tagging_loss=0.008738, over 3052278.05 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 10:16:45,633 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3342513.3333333335, ans=0.0 2023-11-26 10:16:48,746 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3342513.3333333335, ans=0.0 2023-11-26 10:17:08,418 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 501400 2023-11-26 10:17:15,987 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.39 vs. limit=15.0 2023-11-26 10:17:18,901 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3342713.3333333335, ans=0.0 2023-11-26 10:17:21,199 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3342713.3333333335, ans=0.0 2023-11-26 10:17:25,264 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3342713.3333333335, ans=0.125 2023-11-26 10:17:28,410 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3342780.0, ans=0.125 2023-11-26 10:17:40,347 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 8450, loss[loss=0.07405, simple_loss=0.09615, pruned_loss=0.01607, audio_tagging_loss=0.009912, over 15224.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.089, pruned_loss=0.01234, audio_tagging_loss=0.008762, over 3046862.16 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 10:17:48,441 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 10:17:57,565 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3342913.3333333335, ans=0.0 2023-11-26 10:18:00,553 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3342913.3333333335, ans=0.0 2023-11-26 10:18:03,584 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 501450 2023-11-26 10:18:21,107 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3343046.6666666665, ans=0.0 2023-11-26 10:18:33,660 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.695e+01 8.791e+01 9.317e+01 9.949e+01 1.409e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-26 10:18:34,230 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.07 vs. limit=15.0 2023-11-26 10:18:36,370 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 8500, loss[loss=0.04326, simple_loss=0.0535, pruned_loss=0.006738, audio_tagging_loss=0.009776, over 13625.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08915, pruned_loss=0.01232, audio_tagging_loss=0.008726, over 3045287.64 frames. ], batch size: 54, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 10:18:38,642 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3343180.0, ans=0.0 2023-11-26 10:18:48,293 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3343246.6666666665, ans=0.0 2023-11-26 10:18:53,800 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.95 vs. limit=15.0 2023-11-26 10:18:57,019 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.91 vs. limit=15.0 2023-11-26 10:18:59,112 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 501500 2023-11-26 10:19:04,573 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3343313.3333333335, ans=0.125 2023-11-26 10:19:31,405 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 8550, loss[loss=0.06698, simple_loss=0.09032, pruned_loss=0.01499, audio_tagging_loss=0.006838, over 14441.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.08893, pruned_loss=0.01222, audio_tagging_loss=0.008861, over 3044064.11 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 10:19:54,772 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 501550 2023-11-26 10:20:01,781 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3343646.6666666665, ans=0.125 2023-11-26 10:20:09,137 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.79 vs. limit=15.0 2023-11-26 10:20:13,867 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3343713.3333333335, ans=0.125 2023-11-26 10:20:15,860 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3343780.0, ans=0.1 2023-11-26 10:20:21,321 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3343780.0, ans=0.1 2023-11-26 10:20:24,833 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.108e+01 8.767e+01 9.434e+01 1.006e+02 1.411e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-26 10:20:26,932 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 8600, loss[loss=0.06511, simple_loss=0.0879, pruned_loss=0.01193, audio_tagging_loss=0.00923, over 15376.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08865, pruned_loss=0.01222, audio_tagging_loss=0.008844, over 3036927.30 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 10:20:37,362 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3343913.3333333335, ans=0.125 2023-11-26 10:20:50,923 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 501600 2023-11-26 10:21:02,908 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3344046.6666666665, ans=0.1 2023-11-26 10:21:05,010 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3344046.6666666665, ans=0.0 2023-11-26 10:21:14,111 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3344113.3333333335, ans=0.1 2023-11-26 10:21:23,363 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 8650, loss[loss=0.07093, simple_loss=0.09554, pruned_loss=0.01286, audio_tagging_loss=0.01029, over 15761.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.08961, pruned_loss=0.01215, audio_tagging_loss=0.00885, over 3035698.68 frames. ], batch size: 62, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:21:32,517 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3344180.0, ans=0.125 2023-11-26 10:21:33,659 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3344246.6666666665, ans=0.0 2023-11-26 10:21:46,186 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 501650 2023-11-26 10:22:18,148 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.517e+01 8.843e+01 9.565e+01 1.046e+02 1.310e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-26 10:22:19,250 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 8700, loss[loss=0.05952, simple_loss=0.08098, pruned_loss=0.01035, audio_tagging_loss=0.008679, over 15389.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.08945, pruned_loss=0.01222, audio_tagging_loss=0.008954, over 3042668.85 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:22:19,463 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3344513.3333333335, ans=0.2 2023-11-26 10:22:24,781 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3344513.3333333335, ans=0.0 2023-11-26 10:22:31,122 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3344580.0, ans=0.0 2023-11-26 10:22:41,947 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.24 vs. limit=22.5 2023-11-26 10:22:42,578 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 501700 2023-11-26 10:22:47,020 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3344646.6666666665, ans=0.05 2023-11-26 10:22:52,868 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3344713.3333333335, ans=0.05 2023-11-26 10:22:55,549 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.26 vs. limit=15.0 2023-11-26 10:23:04,125 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3344780.0, ans=0.0 2023-11-26 10:23:15,037 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 8750, loss[loss=0.07004, simple_loss=0.09396, pruned_loss=0.01253, audio_tagging_loss=0.01053, over 15258.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.08928, pruned_loss=0.01243, audio_tagging_loss=0.009, over 3040218.26 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:23:27,412 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3344913.3333333335, ans=0.125 2023-11-26 10:23:32,757 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.86 vs. limit=15.0 2023-11-26 10:23:38,506 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 501750 2023-11-26 10:23:44,408 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3344980.0, ans=0.1 2023-11-26 10:23:45,415 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3344980.0, ans=0.125 2023-11-26 10:24:09,411 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.663e+01 8.969e+01 9.428e+01 9.946e+01 1.483e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-26 10:24:10,494 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 8800, loss[loss=0.07555, simple_loss=0.1021, pruned_loss=0.01649, audio_tagging_loss=0.007998, over 15154.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.08939, pruned_loss=0.01245, audio_tagging_loss=0.009066, over 3032730.03 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 10:24:11,704 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3345180.0, ans=0.125 2023-11-26 10:24:28,278 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.26 vs. limit=15.0 2023-11-26 10:24:34,019 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 501800 2023-11-26 10:24:54,472 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3345446.6666666665, ans=0.1 2023-11-26 10:25:06,488 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 8850, loss[loss=0.06012, simple_loss=0.08097, pruned_loss=0.01212, audio_tagging_loss=0.007517, over 14689.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.08993, pruned_loss=0.01245, audio_tagging_loss=0.00893, over 3032714.17 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:25:13,217 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.32 vs. limit=5.0 2023-11-26 10:25:14,003 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.08 vs. limit=22.5 2023-11-26 10:25:18,735 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 10:25:26,344 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.81 vs. limit=6.0 2023-11-26 10:25:29,770 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 501850 2023-11-26 10:25:46,624 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3345713.3333333335, ans=0.2 2023-11-26 10:26:01,685 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 8900, loss[loss=0.05915, simple_loss=0.07292, pruned_loss=0.01169, audio_tagging_loss=0.011, over 14690.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.09032, pruned_loss=0.01231, audio_tagging_loss=0.008884, over 3038346.62 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 8.0 2023-11-26 10:26:02,718 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.355e+01 8.662e+01 9.297e+01 1.004e+02 1.286e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-26 10:26:15,702 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3345913.3333333335, ans=0.0 2023-11-26 10:26:18,156 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.23 vs. limit=22.5 2023-11-26 10:26:20,355 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3345913.3333333335, ans=0.5 2023-11-26 10:26:21,409 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3345913.3333333335, ans=0.1 2023-11-26 10:26:25,478 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 501900 2023-11-26 10:26:47,236 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.71 vs. limit=15.0 2023-11-26 10:26:48,990 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3346113.3333333335, ans=0.125 2023-11-26 10:26:51,169 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3346113.3333333335, ans=0.0 2023-11-26 10:26:53,308 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3346113.3333333335, ans=0.1 2023-11-26 10:26:57,799 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 8950, loss[loss=0.05826, simple_loss=0.07843, pruned_loss=0.01062, audio_tagging_loss=0.008422, over 14336.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.08977, pruned_loss=0.01209, audio_tagging_loss=0.008791, over 3043087.89 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 8.0 2023-11-26 10:27:03,836 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3346180.0, ans=0.125 2023-11-26 10:27:12,331 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3346246.6666666665, ans=0.125 2023-11-26 10:27:13,375 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3346246.6666666665, ans=0.2 2023-11-26 10:27:21,176 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 501950 2023-11-26 10:27:21,277 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3346313.3333333335, ans=0.125 2023-11-26 10:27:27,696 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3346313.3333333335, ans=0.0 2023-11-26 10:27:31,328 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3346380.0, ans=0.0 2023-11-26 10:27:32,251 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3346380.0, ans=0.015 2023-11-26 10:27:35,689 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3346380.0, ans=0.0 2023-11-26 10:27:38,296 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3346380.0, ans=0.2 2023-11-26 10:27:41,535 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3346446.6666666665, ans=0.0 2023-11-26 10:27:41,876 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.42 vs. limit=22.5 2023-11-26 10:27:42,929 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.42 vs. limit=6.0 2023-11-26 10:27:53,378 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 9000, loss[loss=0.08529, simple_loss=0.1177, pruned_loss=0.01959, audio_tagging_loss=0.006824, over 14086.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.08975, pruned_loss=0.01222, audio_tagging_loss=0.008714, over 3044818.23 frames. ], batch size: 54, lr: 1.60e-03, grad_scale: 8.0 2023-11-26 10:27:53,380 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-26 10:28:09,242 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.7944, 5.8167, 5.8905, 5.8551], device='cuda:0') 2023-11-26 10:28:26,047 INFO [train_asr.py:1267] (0/4) Epoch 42, validation: loss=0.05901, simple_loss=0.0506, pruned_loss=0.005264, audio_tagging_loss=0.02845, over 4681554.00 frames. 2023-11-26 10:28:26,048 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-26 10:28:27,064 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.648e+01 8.869e+01 9.478e+01 9.908e+01 1.192e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-26 10:28:34,322 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.85 vs. limit=6.0 2023-11-26 10:28:40,899 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3346580.0, ans=0.0 2023-11-26 10:28:49,210 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 502000 2023-11-26 10:29:09,871 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.18 vs. limit=15.0 2023-11-26 10:29:14,965 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3346780.0, ans=0.0 2023-11-26 10:29:14,969 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3346780.0, ans=0.0 2023-11-26 10:29:21,684 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 9050, loss[loss=0.06972, simple_loss=0.104, pruned_loss=0.009744, audio_tagging_loss=0.00795, over 16196.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.08999, pruned_loss=0.01219, audio_tagging_loss=0.00865, over 3049178.06 frames. ], batch size: 62, lr: 1.60e-03, grad_scale: 8.0 2023-11-26 10:29:27,681 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3346846.6666666665, ans=0.5 2023-11-26 10:29:38,571 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.10 vs. limit=10.0 2023-11-26 10:29:44,479 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 502050 2023-11-26 10:30:09,147 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3347113.3333333335, ans=0.2 2023-11-26 10:30:17,416 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 9100, loss[loss=0.07631, simple_loss=0.1068, pruned_loss=0.01766, audio_tagging_loss=0.00526, over 15038.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.0904, pruned_loss=0.01221, audio_tagging_loss=0.008571, over 3049382.24 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 8.0 2023-11-26 10:30:18,471 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.744e+01 8.587e+01 9.359e+01 1.002e+02 1.217e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-26 10:30:21,111 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.34 vs. limit=22.5 2023-11-26 10:30:27,433 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3347246.6666666665, ans=0.0 2023-11-26 10:30:41,772 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 502100 2023-11-26 10:30:47,235 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3347313.3333333335, ans=0.0 2023-11-26 10:30:47,303 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3347313.3333333335, ans=0.125 2023-11-26 10:30:50,117 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3347313.3333333335, ans=0.125 2023-11-26 10:31:13,164 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 9150, loss[loss=0.06921, simple_loss=0.0957, pruned_loss=0.0129, audio_tagging_loss=0.00847, over 15297.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.0902, pruned_loss=0.01223, audio_tagging_loss=0.008636, over 3052288.03 frames. ], batch size: 60, lr: 1.60e-03, grad_scale: 8.0 2023-11-26 10:31:29,243 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.63 vs. limit=10.0 2023-11-26 10:31:33,553 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3347580.0, ans=0.125 2023-11-26 10:31:37,543 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 502150 2023-11-26 10:31:50,457 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3347713.3333333335, ans=0.0 2023-11-26 10:31:54,760 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3347713.3333333335, ans=0.1 2023-11-26 10:32:04,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3347780.0, ans=0.125 2023-11-26 10:32:08,219 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3347780.0, ans=0.07 2023-11-26 10:32:10,075 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 9200, loss[loss=0.0501, simple_loss=0.06477, pruned_loss=0.009475, audio_tagging_loss=0.008242, over 14746.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.09023, pruned_loss=0.01225, audio_tagging_loss=0.008589, over 3049751.11 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:32:11,100 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.386e+01 8.621e+01 9.324e+01 1.045e+02 1.374e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-26 10:32:12,692 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.98 vs. limit=15.0 2023-11-26 10:32:16,113 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3347846.6666666665, ans=0.0 2023-11-26 10:32:33,159 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 502200 2023-11-26 10:32:37,939 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3347980.0, ans=0.125 2023-11-26 10:32:54,577 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3348113.3333333335, ans=0.2 2023-11-26 10:32:54,604 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3348113.3333333335, ans=0.0 2023-11-26 10:32:55,608 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3348113.3333333335, ans=0.125 2023-11-26 10:32:57,254 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3348113.3333333335, ans=0.0 2023-11-26 10:33:04,909 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.75 vs. limit=6.0 2023-11-26 10:33:06,575 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 9250, loss[loss=0.05795, simple_loss=0.07438, pruned_loss=0.01063, audio_tagging_loss=0.01014, over 14759.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.08915, pruned_loss=0.01217, audio_tagging_loss=0.008658, over 3049980.27 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:33:09,393 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.66 vs. limit=6.0 2023-11-26 10:33:30,094 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 502250 2023-11-26 10:33:33,433 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.19 vs. limit=15.0 2023-11-26 10:33:35,066 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3348313.3333333335, ans=0.1 2023-11-26 10:33:44,306 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.46 vs. limit=10.0 2023-11-26 10:33:50,444 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3348446.6666666665, ans=0.0 2023-11-26 10:33:52,621 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3348446.6666666665, ans=0.0 2023-11-26 10:33:55,084 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.26 vs. limit=15.0 2023-11-26 10:34:02,027 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 9300, loss[loss=0.06174, simple_loss=0.08787, pruned_loss=0.009959, audio_tagging_loss=0.00785, over 16159.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.08959, pruned_loss=0.01228, audio_tagging_loss=0.00864, over 3053950.62 frames. ], batch size: 60, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:34:03,059 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.207e+01 8.725e+01 9.420e+01 1.023e+02 1.550e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-26 10:34:07,259 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.18 vs. limit=15.0 2023-11-26 10:34:16,108 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3348580.0, ans=0.2 2023-11-26 10:34:21,794 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3348580.0, ans=0.0 2023-11-26 10:34:26,038 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 502300 2023-11-26 10:34:27,242 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3348646.6666666665, ans=0.125 2023-11-26 10:34:29,715 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.24 vs. limit=15.0 2023-11-26 10:34:49,011 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3348780.0, ans=0.0 2023-11-26 10:34:57,938 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 9350, loss[loss=0.06163, simple_loss=0.08464, pruned_loss=0.01041, audio_tagging_loss=0.008897, over 14693.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.0899, pruned_loss=0.01238, audio_tagging_loss=0.008649, over 3047804.47 frames. ], batch size: 59, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:35:18,357 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3348913.3333333335, ans=0.0 2023-11-26 10:35:21,324 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 502350 2023-11-26 10:35:22,574 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3348980.0, ans=0.1 2023-11-26 10:35:27,876 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3348980.0, ans=0.125 2023-11-26 10:35:28,840 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3348980.0, ans=0.04949747468305833 2023-11-26 10:35:37,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3349046.6666666665, ans=0.0 2023-11-26 10:35:54,259 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 9400, loss[loss=0.08195, simple_loss=0.111, pruned_loss=0.01897, audio_tagging_loss=0.007475, over 14610.00 frames. ], tot_loss[loss=0.06596, simple_loss=0.08978, pruned_loss=0.01234, audio_tagging_loss=0.008737, over 3048605.93 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:35:55,295 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.423e+01 8.779e+01 9.527e+01 1.025e+02 1.453e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-26 10:36:00,963 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3349180.0, ans=0.125 2023-11-26 10:36:16,397 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3349313.3333333335, ans=0.1 2023-11-26 10:36:17,351 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 502400 2023-11-26 10:36:27,388 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3349380.0, ans=0.125 2023-11-26 10:36:28,429 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3349380.0, ans=0.125 2023-11-26 10:36:50,156 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 9450, loss[loss=0.06031, simple_loss=0.08251, pruned_loss=0.008154, audio_tagging_loss=0.0109, over 15245.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.08961, pruned_loss=0.0123, audio_tagging_loss=0.008835, over 3050580.56 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:36:50,199 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 10:36:56,158 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3349513.3333333335, ans=0.1 2023-11-26 10:37:00,488 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3349580.0, ans=0.0 2023-11-26 10:37:03,689 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 10:37:08,441 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3349580.0, ans=0.05 2023-11-26 10:37:11,141 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3349580.0, ans=0.025 2023-11-26 10:37:13,171 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.60 vs. limit=15.0 2023-11-26 10:37:14,702 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 502450 2023-11-26 10:37:17,090 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3349646.6666666665, ans=0.125 2023-11-26 10:37:21,541 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.07 vs. limit=15.0 2023-11-26 10:37:32,378 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.88 vs. limit=10.0 2023-11-26 10:37:35,084 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3349780.0, ans=0.1 2023-11-26 10:37:37,172 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3349780.0, ans=0.05 2023-11-26 10:37:40,979 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3349780.0, ans=0.0 2023-11-26 10:37:46,047 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 9500, loss[loss=0.06814, simple_loss=0.09915, pruned_loss=0.01062, audio_tagging_loss=0.007944, over 14846.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.08896, pruned_loss=0.01224, audio_tagging_loss=0.009008, over 3046034.07 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:37:47,605 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.536e+01 8.787e+01 9.530e+01 1.013e+02 1.442e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-26 10:37:47,855 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3349846.6666666665, ans=0.125 2023-11-26 10:37:48,878 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3349846.6666666665, ans=0.1 2023-11-26 10:38:00,014 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3349913.3333333335, ans=0.125 2023-11-26 10:38:03,329 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3349913.3333333335, ans=0.125 2023-11-26 10:38:08,565 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3349980.0, ans=0.04949747468305833 2023-11-26 10:38:09,483 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 502500 2023-11-26 10:38:20,823 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3350046.6666666665, ans=0.0 2023-11-26 10:38:21,799 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3350046.6666666665, ans=0.125 2023-11-26 10:38:24,012 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3350046.6666666665, ans=0.125 2023-11-26 10:38:27,299 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3350046.6666666665, ans=0.125 2023-11-26 10:38:29,858 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3350113.3333333335, ans=0.2 2023-11-26 10:38:42,488 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 9550, loss[loss=0.04841, simple_loss=0.05728, pruned_loss=0.009628, audio_tagging_loss=0.01014, over 13749.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.08935, pruned_loss=0.01224, audio_tagging_loss=0.009013, over 3037605.49 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:39:05,364 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 502550 2023-11-26 10:39:20,393 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-26 10:39:21,437 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3350380.0, ans=0.025 2023-11-26 10:39:26,030 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3350446.6666666665, ans=0.1 2023-11-26 10:39:37,590 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 9600, loss[loss=0.05843, simple_loss=0.07706, pruned_loss=0.009477, audio_tagging_loss=0.01042, over 16385.00 frames. ], tot_loss[loss=0.06664, simple_loss=0.09038, pruned_loss=0.01244, audio_tagging_loss=0.009011, over 3033750.52 frames. ], batch size: 63, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 10:39:37,868 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3350513.3333333335, ans=0.0 2023-11-26 10:39:38,624 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.737e+01 8.684e+01 9.310e+01 1.004e+02 1.298e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-26 10:39:38,891 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3350513.3333333335, ans=0.125 2023-11-26 10:39:39,219 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.64 vs. limit=15.0 2023-11-26 10:39:46,870 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.04 vs. limit=22.5 2023-11-26 10:40:00,094 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3350646.6666666665, ans=0.2 2023-11-26 10:40:01,561 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 502600 2023-11-26 10:40:03,554 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.21 vs. limit=22.5 2023-11-26 10:40:12,223 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3350713.3333333335, ans=0.125 2023-11-26 10:40:12,444 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.98 vs. limit=15.0 2023-11-26 10:40:24,878 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3350780.0, ans=0.125 2023-11-26 10:40:26,443 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.91 vs. limit=10.0 2023-11-26 10:40:32,864 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3350846.6666666665, ans=0.125 2023-11-26 10:40:33,883 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 9650, loss[loss=0.03587, simple_loss=0.03334, pruned_loss=0.005945, audio_tagging_loss=0.01326, over 13849.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.08972, pruned_loss=0.01228, audio_tagging_loss=0.009015, over 3033405.90 frames. ], batch size: 54, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 10:40:38,003 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3350846.6666666665, ans=0.125 2023-11-26 10:40:55,862 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3350980.0, ans=0.125 2023-11-26 10:40:57,895 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 502650 2023-11-26 10:41:25,902 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3351113.3333333335, ans=0.125 2023-11-26 10:41:28,481 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3351113.3333333335, ans=10.0 2023-11-26 10:41:30,419 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 9700, loss[loss=0.09136, simple_loss=0.1298, pruned_loss=0.01917, audio_tagging_loss=0.007275, over 16351.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.08995, pruned_loss=0.01237, audio_tagging_loss=0.008867, over 3032950.15 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 10:41:31,448 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.274e+01 8.780e+01 9.294e+01 1.006e+02 1.332e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-26 10:41:49,640 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.25 vs. limit=15.0 2023-11-26 10:41:53,991 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 502700 2023-11-26 10:42:10,065 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3351380.0, ans=0.125 2023-11-26 10:42:26,556 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 9750, loss[loss=0.05643, simple_loss=0.0721, pruned_loss=0.01126, audio_tagging_loss=0.009116, over 15086.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.08975, pruned_loss=0.01239, audio_tagging_loss=0.008797, over 3035364.16 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 10:42:39,065 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.36 vs. limit=15.0 2023-11-26 10:42:39,853 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3351580.0, ans=0.125 2023-11-26 10:42:42,057 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3351580.0, ans=0.0 2023-11-26 10:42:49,938 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 502750 2023-11-26 10:43:02,329 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3351713.3333333335, ans=0.125 2023-11-26 10:43:03,034 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3351713.3333333335, ans=0.5 2023-11-26 10:43:10,331 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3351780.0, ans=0.2 2023-11-26 10:43:11,407 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3351780.0, ans=0.2 2023-11-26 10:43:22,283 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 9800, loss[loss=0.06537, simple_loss=0.0828, pruned_loss=0.01283, audio_tagging_loss=0.01114, over 14210.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.09, pruned_loss=0.01251, audio_tagging_loss=0.00873, over 3039787.21 frames. ], batch size: 54, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 10:43:23,299 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.375e+01 9.016e+01 9.407e+01 1.014e+02 1.286e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-26 10:43:45,585 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 502800 2023-11-26 10:43:49,672 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3351980.0, ans=0.0 2023-11-26 10:43:49,833 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3351980.0, ans=0.0 2023-11-26 10:43:53,990 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3351980.0, ans=0.0 2023-11-26 10:44:05,137 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3352046.6666666665, ans=0.0 2023-11-26 10:44:14,071 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 10:44:18,088 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3352180.0, ans=0.2 2023-11-26 10:44:18,862 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 9850, loss[loss=0.05964, simple_loss=0.08118, pruned_loss=0.01208, audio_tagging_loss=0.006966, over 16080.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.09058, pruned_loss=0.01266, audio_tagging_loss=0.008606, over 3039118.40 frames. ], batch size: 62, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:44:25,605 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3352180.0, ans=0.0 2023-11-26 10:44:26,938 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.02 vs. limit=15.0 2023-11-26 10:44:41,937 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 502850 2023-11-26 10:44:46,235 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.79 vs. limit=15.0 2023-11-26 10:44:54,720 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3352380.0, ans=0.125 2023-11-26 10:44:56,825 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3352380.0, ans=0.1 2023-11-26 10:45:01,627 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 10:45:09,121 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3352446.6666666665, ans=0.05 2023-11-26 10:45:13,347 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3352513.3333333335, ans=0.2 2023-11-26 10:45:14,244 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 9900, loss[loss=0.08648, simple_loss=0.1308, pruned_loss=0.01615, audio_tagging_loss=0.004933, over 15747.00 frames. ], tot_loss[loss=0.06741, simple_loss=0.09205, pruned_loss=0.01285, audio_tagging_loss=0.008536, over 3037989.58 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:45:16,979 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.238e+01 8.825e+01 9.361e+01 1.007e+02 1.352e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-26 10:45:21,334 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3352513.3333333335, ans=0.125 2023-11-26 10:45:21,454 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3352513.3333333335, ans=0.07 2023-11-26 10:45:24,714 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3352580.0, ans=0.0 2023-11-26 10:45:38,572 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 502900 2023-11-26 10:45:41,966 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3352646.6666666665, ans=0.0 2023-11-26 10:45:42,029 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3352646.6666666665, ans=0.125 2023-11-26 10:45:49,236 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.85 vs. limit=15.0 2023-11-26 10:45:56,413 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3352713.3333333335, ans=0.125 2023-11-26 10:46:09,133 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3352780.0, ans=0.125 2023-11-26 10:46:11,023 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 9950, loss[loss=0.05817, simple_loss=0.079, pruned_loss=0.009972, audio_tagging_loss=0.008696, over 15419.00 frames. ], tot_loss[loss=0.06696, simple_loss=0.09157, pruned_loss=0.01264, audio_tagging_loss=0.008533, over 3038041.16 frames. ], batch size: 59, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:46:19,400 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3352846.6666666665, ans=0.125 2023-11-26 10:46:26,466 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.75 vs. limit=15.0 2023-11-26 10:46:34,434 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 502950 2023-11-26 10:46:40,470 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.11 vs. limit=22.5 2023-11-26 10:46:49,397 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.24 vs. limit=15.0 2023-11-26 10:47:00,770 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.10 vs. limit=15.0 2023-11-26 10:47:06,956 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 10000, loss[loss=0.05167, simple_loss=0.06669, pruned_loss=0.00675, audio_tagging_loss=0.01157, over 15853.00 frames. ], tot_loss[loss=0.06692, simple_loss=0.09162, pruned_loss=0.01262, audio_tagging_loss=0.00849, over 3044308.17 frames. ], batch size: 60, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 10:47:09,651 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.184e+01 8.822e+01 9.378e+01 1.009e+02 1.316e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-26 10:47:09,928 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3353180.0, ans=0.125 2023-11-26 10:47:28,982 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.72 vs. limit=15.0 2023-11-26 10:47:30,544 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 503000 2023-11-26 10:47:45,927 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3353380.0, ans=0.125 2023-11-26 10:47:51,925 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.30 vs. limit=12.0 2023-11-26 10:48:03,175 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 10050, loss[loss=0.05909, simple_loss=0.09056, pruned_loss=0.007471, audio_tagging_loss=0.006333, over 15084.00 frames. ], tot_loss[loss=0.06645, simple_loss=0.0908, pruned_loss=0.01244, audio_tagging_loss=0.008614, over 3049882.31 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 10:48:06,504 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3353513.3333333335, ans=0.1 2023-11-26 10:48:14,591 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3353580.0, ans=0.125 2023-11-26 10:48:21,470 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3353580.0, ans=0.0 2023-11-26 10:48:27,020 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 503050 2023-11-26 10:48:59,508 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 10100, loss[loss=0.08176, simple_loss=0.111, pruned_loss=0.01721, audio_tagging_loss=0.009049, over 15434.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.09069, pruned_loss=0.01239, audio_tagging_loss=0.008632, over 3050752.62 frames. ], batch size: 54, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:49:02,638 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.273e+01 8.507e+01 9.128e+01 1.020e+02 1.362e+02, threshold=1.826e+02, percent-clipped=0.0 2023-11-26 10:49:22,975 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 503100 2023-11-26 10:49:26,305 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3353980.0, ans=0.125 2023-11-26 10:49:31,672 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3354046.6666666665, ans=0.125 2023-11-26 10:49:45,225 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 10:49:55,343 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 10150, loss[loss=0.04574, simple_loss=0.05623, pruned_loss=0.00676, audio_tagging_loss=0.01087, over 14020.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.08981, pruned_loss=0.01212, audio_tagging_loss=0.008705, over 3054096.26 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:50:01,236 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3354180.0, ans=0.0 2023-11-26 10:50:09,966 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3354246.6666666665, ans=0.0 2023-11-26 10:50:14,061 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3354246.6666666665, ans=0.0 2023-11-26 10:50:14,301 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.08 vs. limit=15.0 2023-11-26 10:50:18,276 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 503150 2023-11-26 10:50:22,567 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 10:50:23,423 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=3354313.3333333335, ans=10.0 2023-11-26 10:50:23,457 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3354313.3333333335, ans=0.2 2023-11-26 10:50:50,981 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 10200, loss[loss=0.06984, simple_loss=0.09158, pruned_loss=0.01386, audio_tagging_loss=0.01019, over 15494.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.09035, pruned_loss=0.01238, audio_tagging_loss=0.008762, over 3059081.56 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:50:54,052 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.380e+01 8.608e+01 9.223e+01 1.008e+02 1.287e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-26 10:51:02,651 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3354580.0, ans=0.0 2023-11-26 10:51:13,274 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 10:51:14,385 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 503200 2023-11-26 10:51:28,452 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3354713.3333333335, ans=0.125 2023-11-26 10:51:35,817 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3354780.0, ans=0.0 2023-11-26 10:51:41,114 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3354780.0, ans=0.035 2023-11-26 10:51:46,352 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 10250, loss[loss=0.07094, simple_loss=0.09982, pruned_loss=0.01302, audio_tagging_loss=0.00801, over 15737.00 frames. ], tot_loss[loss=0.06649, simple_loss=0.0906, pruned_loss=0.01236, audio_tagging_loss=0.008838, over 3056125.28 frames. ], batch size: 61, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:51:53,380 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.65 vs. limit=6.0 2023-11-26 10:52:05,616 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.39 vs. limit=15.0 2023-11-26 10:52:10,978 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 503250 2023-11-26 10:52:15,286 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3354980.0, ans=0.125 2023-11-26 10:52:27,072 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3355046.6666666665, ans=0.1 2023-11-26 10:52:29,579 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.44 vs. limit=10.0 2023-11-26 10:52:43,315 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 10300, loss[loss=0.07624, simple_loss=0.1044, pruned_loss=0.01475, audio_tagging_loss=0.009295, over 15500.00 frames. ], tot_loss[loss=0.06697, simple_loss=0.09138, pruned_loss=0.01249, audio_tagging_loss=0.008789, over 3062395.24 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:52:44,521 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3355180.0, ans=0.0 2023-11-26 10:52:46,395 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.459e+01 8.732e+01 9.378e+01 1.015e+02 1.295e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-26 10:53:02,756 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.26 vs. limit=10.0 2023-11-26 10:53:03,456 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3355246.6666666665, ans=0.0 2023-11-26 10:53:06,348 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 503300 2023-11-26 10:53:13,379 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.43 vs. limit=6.0 2023-11-26 10:53:16,151 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.65 vs. limit=15.0 2023-11-26 10:53:24,424 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.36 vs. limit=12.0 2023-11-26 10:53:28,387 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3355446.6666666665, ans=0.125 2023-11-26 10:53:33,167 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3355446.6666666665, ans=0.0 2023-11-26 10:53:39,418 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 10350, loss[loss=0.07642, simple_loss=0.1107, pruned_loss=0.01449, audio_tagging_loss=0.006597, over 16313.00 frames. ], tot_loss[loss=0.06692, simple_loss=0.09118, pruned_loss=0.01248, audio_tagging_loss=0.008847, over 3056835.17 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:53:53,222 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3355580.0, ans=0.125 2023-11-26 10:54:01,242 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3355646.6666666665, ans=0.125 2023-11-26 10:54:02,165 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 503350 2023-11-26 10:54:10,920 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3355646.6666666665, ans=0.07 2023-11-26 10:54:12,150 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.72 vs. limit=12.0 2023-11-26 10:54:14,854 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.26 vs. limit=15.0 2023-11-26 10:54:34,668 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 10400, loss[loss=0.05648, simple_loss=0.07412, pruned_loss=0.008644, audio_tagging_loss=0.01077, over 15523.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.09027, pruned_loss=0.01233, audio_tagging_loss=0.008931, over 3054785.68 frames. ], batch size: 60, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 10:54:36,248 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.90 vs. limit=15.0 2023-11-26 10:54:37,762 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.157e+01 8.834e+01 9.411e+01 9.985e+01 1.468e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-26 10:54:54,021 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3355913.3333333335, ans=0.0 2023-11-26 10:54:58,692 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 503400 2023-11-26 10:55:00,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3355980.0, ans=0.04949747468305833 2023-11-26 10:55:12,982 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3356046.6666666665, ans=0.0 2023-11-26 10:55:30,879 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 10450, loss[loss=0.07278, simple_loss=0.1012, pruned_loss=0.01387, audio_tagging_loss=0.008327, over 14625.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.08876, pruned_loss=0.01205, audio_tagging_loss=0.009047, over 3045624.88 frames. ], batch size: 54, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 10:55:39,724 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3356180.0, ans=0.0 2023-11-26 10:55:54,543 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 503450 2023-11-26 10:55:56,230 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.65 vs. limit=15.0 2023-11-26 10:56:15,404 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 10:56:27,222 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 10500, loss[loss=0.07517, simple_loss=0.1008, pruned_loss=0.01674, audio_tagging_loss=0.008016, over 14431.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08844, pruned_loss=0.012, audio_tagging_loss=0.008908, over 3047502.17 frames. ], batch size: 54, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 10:56:30,347 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.498e+01 8.592e+01 9.300e+01 9.951e+01 1.449e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-26 10:56:50,272 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 503500 2023-11-26 10:57:07,904 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3356713.3333333335, ans=0.125 2023-11-26 10:57:09,985 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3356713.3333333335, ans=0.2 2023-11-26 10:57:15,336 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 10:57:16,423 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 10:57:17,480 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3356780.0, ans=0.125 2023-11-26 10:57:22,551 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 10550, loss[loss=0.06615, simple_loss=0.09279, pruned_loss=0.01078, audio_tagging_loss=0.008977, over 15754.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08874, pruned_loss=0.01211, audio_tagging_loss=0.008809, over 3049485.74 frames. ], batch size: 60, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 10:57:30,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3356846.6666666665, ans=0.1 2023-11-26 10:57:34,065 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3356913.3333333335, ans=0.125 2023-11-26 10:57:36,162 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3356913.3333333335, ans=0.125 2023-11-26 10:57:45,788 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3356980.0, ans=0.125 2023-11-26 10:57:47,206 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 503550 2023-11-26 10:57:54,764 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3356980.0, ans=0.125 2023-11-26 10:57:59,378 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.26 vs. limit=15.0 2023-11-26 10:58:15,614 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3357113.3333333335, ans=0.1 2023-11-26 10:58:18,535 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 10600, loss[loss=0.05756, simple_loss=0.07836, pruned_loss=0.01032, audio_tagging_loss=0.008063, over 15257.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08892, pruned_loss=0.01221, audio_tagging_loss=0.008725, over 3045489.00 frames. ], batch size: 60, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:58:19,866 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3357180.0, ans=0.0 2023-11-26 10:58:21,408 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3357180.0, ans=0.125 2023-11-26 10:58:22,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3357180.0, ans=0.125 2023-11-26 10:58:22,921 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.05 vs. limit=6.0 2023-11-26 10:58:23,334 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.764e+01 8.911e+01 9.725e+01 1.038e+02 1.409e+02, threshold=1.945e+02, percent-clipped=0.0 2023-11-26 10:58:27,756 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.82 vs. limit=15.0 2023-11-26 10:58:41,633 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3357313.3333333335, ans=0.2 2023-11-26 10:58:42,558 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 503600 2023-11-26 10:58:42,772 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3357313.3333333335, ans=0.0 2023-11-26 10:58:55,200 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3357380.0, ans=0.0 2023-11-26 10:58:56,295 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3357380.0, ans=0.0 2023-11-26 10:58:59,530 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3357380.0, ans=0.0 2023-11-26 10:59:07,531 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3357446.6666666665, ans=0.1 2023-11-26 10:59:08,867 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.65 vs. limit=10.0 2023-11-26 10:59:09,547 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3357446.6666666665, ans=0.0 2023-11-26 10:59:11,171 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3357446.6666666665, ans=0.0 2023-11-26 10:59:14,304 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.93 vs. limit=15.0 2023-11-26 10:59:15,789 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 10650, loss[loss=0.0869, simple_loss=0.1294, pruned_loss=0.01692, audio_tagging_loss=0.005273, over 16423.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.08943, pruned_loss=0.01232, audio_tagging_loss=0.0087, over 3042440.52 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:59:26,621 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3357580.0, ans=0.125 2023-11-26 10:59:38,861 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 503650 2023-11-26 10:59:38,966 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3357646.6666666665, ans=0.125 2023-11-26 10:59:50,101 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3357713.3333333335, ans=0.0 2023-11-26 10:59:53,224 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3357713.3333333335, ans=0.0 2023-11-26 11:00:01,896 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.00 vs. limit=15.0 2023-11-26 11:00:09,838 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3357846.6666666665, ans=0.125 2023-11-26 11:00:10,590 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 10700, loss[loss=0.07206, simple_loss=0.08627, pruned_loss=0.01635, audio_tagging_loss=0.01258, over 16100.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08908, pruned_loss=0.01223, audio_tagging_loss=0.008723, over 3046994.64 frames. ], batch size: 64, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 11:00:10,671 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3357846.6666666665, ans=0.015 2023-11-26 11:00:14,917 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.195e+01 8.978e+01 9.499e+01 1.034e+02 2.026e+02, threshold=1.900e+02, percent-clipped=1.0 2023-11-26 11:00:20,067 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3357846.6666666665, ans=0.125 2023-11-26 11:00:21,118 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3357913.3333333335, ans=0.125 2023-11-26 11:00:34,244 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 503700 2023-11-26 11:00:35,356 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3357980.0, ans=0.1 2023-11-26 11:01:06,433 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 10750, loss[loss=0.07227, simple_loss=0.1063, pruned_loss=0.01083, audio_tagging_loss=0.008313, over 15555.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.0889, pruned_loss=0.0121, audio_tagging_loss=0.00875, over 3050767.90 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 11:01:20,491 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3358246.6666666665, ans=0.125 2023-11-26 11:01:30,441 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 503750 2023-11-26 11:01:34,947 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3358313.3333333335, ans=0.1 2023-11-26 11:01:35,905 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3358313.3333333335, ans=0.125 2023-11-26 11:01:40,202 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3358380.0, ans=0.0 2023-11-26 11:01:40,338 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3358380.0, ans=0.2 2023-11-26 11:01:46,086 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3358380.0, ans=0.125 2023-11-26 11:01:47,180 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3358380.0, ans=0.125 2023-11-26 11:02:02,705 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 10800, loss[loss=0.0781, simple_loss=0.1061, pruned_loss=0.01637, audio_tagging_loss=0.008677, over 16149.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08882, pruned_loss=0.01203, audio_tagging_loss=0.008823, over 3045585.28 frames. ], batch size: 59, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 11:02:07,438 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.700e+01 8.688e+01 9.211e+01 9.904e+01 2.001e+02, threshold=1.842e+02, percent-clipped=1.0 2023-11-26 11:02:25,495 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 503800 2023-11-26 11:02:28,009 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.15 vs. limit=6.0 2023-11-26 11:02:29,937 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.31 vs. limit=15.0 2023-11-26 11:02:35,001 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3358713.3333333335, ans=0.0 2023-11-26 11:02:51,839 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.13 vs. limit=22.5 2023-11-26 11:02:56,836 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3358780.0, ans=0.125 2023-11-26 11:02:58,738 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 10850, loss[loss=0.07489, simple_loss=0.1093, pruned_loss=0.01252, audio_tagging_loss=0.007727, over 14738.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08907, pruned_loss=0.01197, audio_tagging_loss=0.008756, over 3050076.55 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 11:03:04,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3358846.6666666665, ans=0.0 2023-11-26 11:03:17,800 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3358913.3333333335, ans=0.05 2023-11-26 11:03:17,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3358913.3333333335, ans=0.125 2023-11-26 11:03:22,355 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 503850 2023-11-26 11:03:40,470 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3359046.6666666665, ans=0.125 2023-11-26 11:03:51,970 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 11:03:54,708 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 10900, loss[loss=0.049, simple_loss=0.06788, pruned_loss=0.006869, audio_tagging_loss=0.008194, over 14187.00 frames. ], tot_loss[loss=0.06507, simple_loss=0.08901, pruned_loss=0.01179, audio_tagging_loss=0.008774, over 3051658.79 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 11:03:58,887 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.407e+01 8.941e+01 9.584e+01 1.034e+02 1.250e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-26 11:04:01,641 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.14 vs. limit=15.0 2023-11-26 11:04:14,233 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3359246.6666666665, ans=0.0 2023-11-26 11:04:18,016 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3359313.3333333335, ans=0.125 2023-11-26 11:04:18,862 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 503900 2023-11-26 11:04:50,466 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 10950, loss[loss=0.05888, simple_loss=0.08674, pruned_loss=0.00786, audio_tagging_loss=0.007651, over 14815.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.08974, pruned_loss=0.01207, audio_tagging_loss=0.008785, over 3056757.49 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 11:04:50,883 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.32 vs. limit=22.5 2023-11-26 11:04:56,689 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.05 vs. limit=15.0 2023-11-26 11:05:11,774 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3359646.6666666665, ans=0.0 2023-11-26 11:05:13,774 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 503950 2023-11-26 11:05:19,135 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.76 vs. limit=12.0 2023-11-26 11:05:23,015 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3359713.3333333335, ans=0.125 2023-11-26 11:05:46,839 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 11000, loss[loss=0.07625, simple_loss=0.1036, pruned_loss=0.01388, audio_tagging_loss=0.01056, over 14297.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08949, pruned_loss=0.01218, audio_tagging_loss=0.008843, over 3053576.86 frames. ], batch size: 54, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 11:05:52,189 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.199e+01 8.584e+01 9.485e+01 1.002e+02 1.136e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-26 11:05:56,472 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 11:05:56,750 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3359913.3333333335, ans=0.125 2023-11-26 11:06:00,377 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3359913.3333333335, ans=0.0 2023-11-26 11:06:05,646 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3359913.3333333335, ans=0.0 2023-11-26 11:06:06,507 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3359913.3333333335, ans=0.0 2023-11-26 11:06:10,152 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 504000 2023-11-26 11:06:11,453 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-504000.pt 2023-11-26 11:06:26,005 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3360046.6666666665, ans=0.2 2023-11-26 11:06:31,782 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3360046.6666666665, ans=0.0 2023-11-26 11:06:40,211 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3360113.3333333335, ans=0.0 2023-11-26 11:06:44,384 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 11050, loss[loss=0.08483, simple_loss=0.1239, pruned_loss=0.01539, audio_tagging_loss=0.007484, over 15814.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.09017, pruned_loss=0.0123, audio_tagging_loss=0.008913, over 3053316.37 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 11:07:08,392 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 504050 2023-11-26 11:07:13,817 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3360313.3333333335, ans=0.125 2023-11-26 11:07:18,593 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3360380.0, ans=0.1 2023-11-26 11:07:24,808 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3360380.0, ans=0.035 2023-11-26 11:07:40,684 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 11100, loss[loss=0.07762, simple_loss=0.1028, pruned_loss=0.01492, audio_tagging_loss=0.01128, over 15369.00 frames. ], tot_loss[loss=0.06678, simple_loss=0.09039, pruned_loss=0.01254, audio_tagging_loss=0.00905, over 3045383.21 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 11:07:46,437 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.379e+01 8.699e+01 9.322e+01 9.971e+01 1.375e+02, threshold=1.864e+02, percent-clipped=0.0 2023-11-26 11:08:04,219 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 504100 2023-11-26 11:08:36,571 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 11150, loss[loss=0.06385, simple_loss=0.09291, pruned_loss=0.00959, audio_tagging_loss=0.007803, over 15504.00 frames. ], tot_loss[loss=0.06715, simple_loss=0.09091, pruned_loss=0.01269, audio_tagging_loss=0.009004, over 3045242.61 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 8.0 2023-11-26 11:08:41,020 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3360846.6666666665, ans=0.035 2023-11-26 11:08:59,676 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.26 vs. limit=12.0 2023-11-26 11:09:00,109 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 504150 2023-11-26 11:09:03,354 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.66 vs. limit=22.5 2023-11-26 11:09:10,934 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3361046.6666666665, ans=0.1 2023-11-26 11:09:11,378 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.66 vs. limit=15.0 2023-11-26 11:09:27,557 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3361113.3333333335, ans=0.125 2023-11-26 11:09:32,604 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 11200, loss[loss=0.06033, simple_loss=0.07656, pruned_loss=0.01295, audio_tagging_loss=0.009101, over 14470.00 frames. ], tot_loss[loss=0.06714, simple_loss=0.09066, pruned_loss=0.01271, audio_tagging_loss=0.009104, over 3054215.65 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 11:09:39,459 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.816e+01 8.765e+01 9.322e+01 9.953e+01 1.270e+02, threshold=1.864e+02, percent-clipped=0.0 2023-11-26 11:09:46,654 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3361246.6666666665, ans=0.2 2023-11-26 11:09:53,380 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3361246.6666666665, ans=0.125 2023-11-26 11:09:56,420 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 504200 2023-11-26 11:10:08,849 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.19 vs. limit=6.0 2023-11-26 11:10:09,555 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3361380.0, ans=0.04949747468305833 2023-11-26 11:10:21,099 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3361446.6666666665, ans=0.125 2023-11-26 11:10:28,730 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 11250, loss[loss=0.05785, simple_loss=0.07722, pruned_loss=0.01052, audio_tagging_loss=0.008721, over 15487.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.08953, pruned_loss=0.01251, audio_tagging_loss=0.00901, over 3053513.49 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 11:10:51,749 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 504250 2023-11-26 11:10:57,737 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3361646.6666666665, ans=0.09899494936611666 2023-11-26 11:11:20,432 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3361780.0, ans=0.0 2023-11-26 11:11:24,496 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 11300, loss[loss=0.06396, simple_loss=0.08078, pruned_loss=0.01224, audio_tagging_loss=0.01133, over 14544.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.08988, pruned_loss=0.01247, audio_tagging_loss=0.008826, over 3051394.68 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 11:11:24,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3361846.6666666665, ans=0.2 2023-11-26 11:11:30,039 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3361846.6666666665, ans=0.07 2023-11-26 11:11:30,896 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.077e+01 8.741e+01 9.355e+01 1.007e+02 1.157e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-26 11:11:33,207 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3361846.6666666665, ans=0.125 2023-11-26 11:11:46,066 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 11:11:48,057 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 504300 2023-11-26 11:12:08,500 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3362113.3333333335, ans=0.09899494936611666 2023-11-26 11:12:14,872 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3362113.3333333335, ans=0.1 2023-11-26 11:12:16,476 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.54 vs. limit=10.0 2023-11-26 11:12:20,007 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 11350, loss[loss=0.0645, simple_loss=0.09295, pruned_loss=0.01103, audio_tagging_loss=0.007, over 17608.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.0898, pruned_loss=0.01231, audio_tagging_loss=0.008719, over 3057340.58 frames. ], batch size: 67, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 11:12:25,169 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3362180.0, ans=0.1 2023-11-26 11:12:38,813 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3362246.6666666665, ans=0.0 2023-11-26 11:12:44,469 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 504350 2023-11-26 11:13:05,903 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3362446.6666666665, ans=0.125 2023-11-26 11:13:12,905 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3362446.6666666665, ans=0.0 2023-11-26 11:13:15,820 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 11400, loss[loss=0.07511, simple_loss=0.09975, pruned_loss=0.01754, audio_tagging_loss=0.00769, over 15742.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.0904, pruned_loss=0.01245, audio_tagging_loss=0.00862, over 3050597.48 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 11:13:16,028 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3362513.3333333335, ans=0.125 2023-11-26 11:13:23,575 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.034e+01 8.698e+01 9.567e+01 1.043e+02 1.467e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-26 11:13:29,614 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3362580.0, ans=0.125 2023-11-26 11:13:39,030 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 504400 2023-11-26 11:13:44,672 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3362646.6666666665, ans=0.125 2023-11-26 11:13:56,731 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.25 vs. limit=15.0 2023-11-26 11:14:12,480 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 11450, loss[loss=0.04601, simple_loss=0.05642, pruned_loss=0.01044, audio_tagging_loss=0.00737, over 13879.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.09012, pruned_loss=0.01238, audio_tagging_loss=0.008536, over 3045989.48 frames. ], batch size: 53, lr: 1.60e-03, grad_scale: 8.0 2023-11-26 11:14:27,381 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3362913.3333333335, ans=0.035 2023-11-26 11:14:35,878 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 504450 2023-11-26 11:14:52,956 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3363046.6666666665, ans=0.0 2023-11-26 11:15:06,899 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3363180.0, ans=0.125 2023-11-26 11:15:07,662 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 11500, loss[loss=0.06371, simple_loss=0.08641, pruned_loss=0.01075, audio_tagging_loss=0.009754, over 15196.00 frames. ], tot_loss[loss=0.06505, simple_loss=0.08883, pruned_loss=0.01208, audio_tagging_loss=0.008551, over 3040242.48 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 8.0 2023-11-26 11:15:12,513 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.32 vs. limit=15.0 2023-11-26 11:15:15,052 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.382e+01 8.706e+01 9.345e+01 1.007e+02 1.360e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-26 11:15:31,723 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 504500 2023-11-26 11:15:36,528 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3363313.3333333335, ans=0.125 2023-11-26 11:15:51,530 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3363446.6666666665, ans=0.1 2023-11-26 11:16:03,942 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 11550, loss[loss=0.06416, simple_loss=0.08606, pruned_loss=0.01165, audio_tagging_loss=0.009479, over 15611.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08979, pruned_loss=0.01221, audio_tagging_loss=0.008547, over 3043241.48 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 8.0 2023-11-26 11:16:24,287 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3363580.0, ans=0.125 2023-11-26 11:16:27,548 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 504550 2023-11-26 11:16:38,075 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 11:16:40,434 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3363713.3333333335, ans=0.125 2023-11-26 11:16:51,666 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3363780.0, ans=0.0 2023-11-26 11:16:51,966 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.83 vs. limit=15.0 2023-11-26 11:16:56,977 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3363780.0, ans=0.2 2023-11-26 11:17:00,350 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 11600, loss[loss=0.04434, simple_loss=0.0547, pruned_loss=0.007095, audio_tagging_loss=0.009895, over 16003.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.0904, pruned_loss=0.01242, audio_tagging_loss=0.008534, over 3044288.64 frames. ], batch size: 62, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 11:17:07,872 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3363846.6666666665, ans=0.0 2023-11-26 11:17:08,666 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.954e+01 8.748e+01 9.551e+01 1.048e+02 1.358e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-26 11:17:13,533 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.81 vs. limit=15.0 2023-11-26 11:17:21,699 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3363980.0, ans=0.5 2023-11-26 11:17:22,568 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 504600 2023-11-26 11:17:29,869 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3363980.0, ans=0.1 2023-11-26 11:17:32,544 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3364046.6666666665, ans=0.125 2023-11-26 11:17:55,416 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 11650, loss[loss=0.06078, simple_loss=0.073, pruned_loss=0.01435, audio_tagging_loss=0.00993, over 14189.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.08997, pruned_loss=0.01239, audio_tagging_loss=0.008637, over 3045977.99 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 8.0 2023-11-26 11:18:01,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3364180.0, ans=0.125 2023-11-26 11:18:18,741 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3364313.3333333335, ans=0.0 2023-11-26 11:18:19,667 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 504650 2023-11-26 11:18:28,152 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.99 vs. limit=22.5 2023-11-26 11:18:31,093 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3364380.0, ans=0.0 2023-11-26 11:18:35,319 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3364380.0, ans=0.0 2023-11-26 11:18:38,996 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.86 vs. limit=15.0 2023-11-26 11:18:40,815 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.53 vs. limit=12.0 2023-11-26 11:18:43,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3364446.6666666665, ans=0.125 2023-11-26 11:18:51,429 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 11700, loss[loss=0.07281, simple_loss=0.1021, pruned_loss=0.01243, audio_tagging_loss=0.009315, over 15791.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.09091, pruned_loss=0.01253, audio_tagging_loss=0.008626, over 3056585.32 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 8.0 2023-11-26 11:19:00,939 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.138e+01 8.724e+01 9.353e+01 9.996e+01 1.834e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-26 11:19:09,230 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3364580.0, ans=0.1 2023-11-26 11:19:12,716 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.63 vs. limit=22.5 2023-11-26 11:19:15,250 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 504700 2023-11-26 11:19:18,617 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3364646.6666666665, ans=0.125 2023-11-26 11:19:26,915 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3364713.3333333335, ans=0.125 2023-11-26 11:19:32,398 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.07 vs. limit=22.5 2023-11-26 11:19:39,932 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3364780.0, ans=0.125 2023-11-26 11:19:47,703 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 11750, loss[loss=0.06492, simple_loss=0.08121, pruned_loss=0.01311, audio_tagging_loss=0.01121, over 14910.00 frames. ], tot_loss[loss=0.06662, simple_loss=0.09088, pruned_loss=0.01252, audio_tagging_loss=0.008665, over 3057655.90 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 8.0 2023-11-26 11:19:50,549 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3364846.6666666665, ans=0.07 2023-11-26 11:20:04,329 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3364913.3333333335, ans=0.05 2023-11-26 11:20:08,616 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3364980.0, ans=0.125 2023-11-26 11:20:09,221 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.97 vs. limit=15.0 2023-11-26 11:20:10,583 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 504750 2023-11-26 11:20:31,865 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3365113.3333333335, ans=0.0 2023-11-26 11:20:38,187 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3365113.3333333335, ans=0.0 2023-11-26 11:20:41,486 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3365113.3333333335, ans=0.0 2023-11-26 11:20:43,312 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 11800, loss[loss=0.06612, simple_loss=0.09327, pruned_loss=0.009891, audio_tagging_loss=0.009591, over 15241.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.0909, pruned_loss=0.01254, audio_tagging_loss=0.008729, over 3052210.54 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 8.0 2023-11-26 11:20:43,576 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3365180.0, ans=0.09899494936611666 2023-11-26 11:20:50,758 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3365180.0, ans=0.125 2023-11-26 11:20:51,616 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.646e+01 8.955e+01 9.712e+01 1.042e+02 1.352e+02, threshold=1.942e+02, percent-clipped=0.0 2023-11-26 11:20:53,957 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3365246.6666666665, ans=0.1 2023-11-26 11:20:56,608 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3365246.6666666665, ans=0.0 2023-11-26 11:21:06,178 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.93 vs. limit=10.0 2023-11-26 11:21:06,723 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 504800 2023-11-26 11:21:19,876 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 11:21:29,341 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3365446.6666666665, ans=0.125 2023-11-26 11:21:39,228 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 11850, loss[loss=0.0763, simple_loss=0.1056, pruned_loss=0.01616, audio_tagging_loss=0.007322, over 14275.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.0907, pruned_loss=0.01249, audio_tagging_loss=0.008765, over 3054352.72 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 8.0 2023-11-26 11:21:39,512 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3365513.3333333335, ans=0.1 2023-11-26 11:21:42,171 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.14 vs. limit=15.0 2023-11-26 11:22:03,165 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 504850 2023-11-26 11:22:09,660 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3365646.6666666665, ans=0.1 2023-11-26 11:22:15,931 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3365713.3333333335, ans=0.125 2023-11-26 11:22:22,717 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3365780.0, ans=0.1 2023-11-26 11:22:34,567 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 11900, loss[loss=0.05945, simple_loss=0.07892, pruned_loss=0.006455, audio_tagging_loss=0.01353, over 15426.00 frames. ], tot_loss[loss=0.06703, simple_loss=0.09136, pruned_loss=0.01258, audio_tagging_loss=0.008778, over 3053111.53 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 8.0 2023-11-26 11:22:44,164 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.301e+01 8.882e+01 9.383e+01 1.003e+02 1.296e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-26 11:22:47,577 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3365913.3333333335, ans=0.125 2023-11-26 11:22:58,065 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 504900 2023-11-26 11:22:58,140 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3365980.0, ans=0.125 2023-11-26 11:23:04,250 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.79 vs. limit=12.0 2023-11-26 11:23:05,204 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3365980.0, ans=0.0 2023-11-26 11:23:25,261 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3366113.3333333335, ans=0.125 2023-11-26 11:23:29,542 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3366180.0, ans=0.125 2023-11-26 11:23:30,284 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 11950, loss[loss=0.0636, simple_loss=0.0875, pruned_loss=0.01077, audio_tagging_loss=0.009075, over 15808.00 frames. ], tot_loss[loss=0.0669, simple_loss=0.09097, pruned_loss=0.01258, audio_tagging_loss=0.008839, over 3059742.79 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 8.0 2023-11-26 11:23:37,968 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3366180.0, ans=0.125 2023-11-26 11:23:40,149 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3366246.6666666665, ans=0.2 2023-11-26 11:23:53,606 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 504950 2023-11-26 11:24:18,557 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3366446.6666666665, ans=0.125 2023-11-26 11:24:24,570 INFO [train_asr.py:1235] (0/4) Epoch 42, batch 12000, loss[loss=0.04812, simple_loss=0.06411, pruned_loss=0.006744, audio_tagging_loss=0.009322, over 15044.00 frames. ], tot_loss[loss=0.06694, simple_loss=0.09085, pruned_loss=0.0125, audio_tagging_loss=0.00902, over 3062355.72 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 11:24:24,572 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-26 11:24:36,261 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.8266, 4.5564, 3.9981, 4.3624], device='cuda:0') 2023-11-26 11:24:48,608 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.4644, 3.7879, 3.0324, 3.8775], device='cuda:0') 2023-11-26 11:24:57,251 INFO [train_asr.py:1267] (0/4) Epoch 42, validation: loss=0.05796, simple_loss=0.05063, pruned_loss=0.005274, audio_tagging_loss=0.02738, over 4681554.00 frames. 2023-11-26 11:24:57,251 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-26 11:25:04,756 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3366513.3333333335, ans=0.95 2023-11-26 11:25:05,475 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.022e+01 8.933e+01 9.493e+01 1.025e+02 1.345e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-26 11:25:14,363 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3366580.0, ans=0.0 2023-11-26 11:25:19,323 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 505000 2023-11-26 11:25:23,753 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/epoch-42.pt 2023-11-26 11:25:50,628 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 0, loss[loss=0.06258, simple_loss=0.06628, pruned_loss=0.00835, audio_tagging_loss=0.02109, over 14159.00 frames. ], tot_loss[loss=0.06258, simple_loss=0.06628, pruned_loss=0.00835, audio_tagging_loss=0.02109, over 14159.00 frames. ], batch size: 54, lr: 1.58e-03, grad_scale: 32.0 2023-11-26 11:25:50,630 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-26 11:26:02,925 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.7957, 4.6072, 4.4028, 4.4246], device='cuda:0') 2023-11-26 11:26:21,929 INFO [train_asr.py:1267] (0/4) Epoch 43, validation: loss=0.05779, simple_loss=0.05063, pruned_loss=0.005275, audio_tagging_loss=0.0272, over 4681554.00 frames. 2023-11-26 11:26:21,930 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-26 11:26:40,317 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.58 vs. limit=12.0 2023-11-26 11:26:49,256 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3366806.6666666665, ans=0.125 2023-11-26 11:27:03,009 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.85 vs. limit=15.0 2023-11-26 11:27:14,018 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 505050 2023-11-26 11:27:17,143 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 50, loss[loss=0.08644, simple_loss=0.1204, pruned_loss=0.0148, audio_tagging_loss=0.01144, over 16086.00 frames. ], tot_loss[loss=0.07588, simple_loss=0.09321, pruned_loss=0.01265, audio_tagging_loss=0.01662, over 692403.93 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 11:27:27,932 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.86 vs. limit=15.0 2023-11-26 11:27:30,471 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3367073.3333333335, ans=0.0 2023-11-26 11:27:32,904 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.18 vs. limit=15.0 2023-11-26 11:27:33,717 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3367073.3333333335, ans=0.1 2023-11-26 11:27:33,740 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3367073.3333333335, ans=0.0 2023-11-26 11:27:35,944 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3367073.3333333335, ans=0.1 2023-11-26 11:27:56,528 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.204e+01 9.484e+01 1.020e+02 1.096e+02 2.411e+02, threshold=2.041e+02, percent-clipped=1.0 2023-11-26 11:28:06,755 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.21 vs. limit=10.0 2023-11-26 11:28:09,871 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 505100 2023-11-26 11:28:13,117 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 100, loss[loss=0.07097, simple_loss=0.08737, pruned_loss=0.01247, audio_tagging_loss=0.01481, over 15091.00 frames. ], tot_loss[loss=0.07496, simple_loss=0.09219, pruned_loss=0.01266, audio_tagging_loss=0.01621, over 1214021.82 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 11:28:31,892 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3367406.6666666665, ans=0.2 2023-11-26 11:28:41,862 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.70 vs. limit=15.0 2023-11-26 11:28:45,724 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3367540.0, ans=0.0 2023-11-26 11:28:51,616 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3367540.0, ans=0.1 2023-11-26 11:29:01,806 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3367606.6666666665, ans=0.125 2023-11-26 11:29:06,280 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 505150 2023-11-26 11:29:09,530 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 150, loss[loss=0.05752, simple_loss=0.0796, pruned_loss=0.007785, audio_tagging_loss=0.009934, over 15192.00 frames. ], tot_loss[loss=0.07206, simple_loss=0.09049, pruned_loss=0.01216, audio_tagging_loss=0.01466, over 1617126.88 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 11:29:15,882 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.47 vs. limit=15.0 2023-11-26 11:29:25,109 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3367740.0, ans=0.125 2023-11-26 11:29:45,421 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3367873.3333333335, ans=0.0 2023-11-26 11:29:47,342 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3367873.3333333335, ans=0.025 2023-11-26 11:29:47,444 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3367873.3333333335, ans=0.1 2023-11-26 11:29:49,993 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.756e+01 9.187e+01 9.762e+01 1.032e+02 1.254e+02, threshold=1.952e+02, percent-clipped=0.0 2023-11-26 11:29:53,785 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.87 vs. limit=15.0 2023-11-26 11:29:54,385 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3367940.0, ans=0.07 2023-11-26 11:29:55,443 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=3367940.0, ans=10.0 2023-11-26 11:30:02,307 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 505200 2023-11-26 11:30:05,723 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 200, loss[loss=0.06799, simple_loss=0.08806, pruned_loss=0.01274, audio_tagging_loss=0.01122, over 14868.00 frames. ], tot_loss[loss=0.07143, simple_loss=0.09203, pruned_loss=0.0126, audio_tagging_loss=0.01281, over 1936448.45 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 11:30:08,530 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.85 vs. limit=15.0 2023-11-26 11:30:14,608 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3368006.6666666665, ans=0.125 2023-11-26 11:30:20,452 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3368073.3333333335, ans=0.125 2023-11-26 11:30:21,589 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3368073.3333333335, ans=0.0 2023-11-26 11:30:32,414 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3368140.0, ans=0.0 2023-11-26 11:30:44,773 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.27 vs. limit=22.5 2023-11-26 11:30:47,015 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.60 vs. limit=15.0 2023-11-26 11:30:58,371 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 505250 2023-11-26 11:31:02,026 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 250, loss[loss=0.0606, simple_loss=0.07866, pruned_loss=0.01111, audio_tagging_loss=0.01016, over 15627.00 frames. ], tot_loss[loss=0.06958, simple_loss=0.0913, pruned_loss=0.01237, audio_tagging_loss=0.01156, over 2178262.26 frames. ], batch size: 59, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 11:31:27,238 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3368473.3333333335, ans=0.125 2023-11-26 11:31:34,715 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3368540.0, ans=0.2 2023-11-26 11:31:36,841 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3368540.0, ans=0.125 2023-11-26 11:31:42,715 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3368540.0, ans=0.125 2023-11-26 11:31:43,455 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.725e+01 9.006e+01 9.717e+01 1.058e+02 1.490e+02, threshold=1.943e+02, percent-clipped=0.0 2023-11-26 11:31:54,692 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 505300 2023-11-26 11:31:58,284 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 300, loss[loss=0.06197, simple_loss=0.0857, pruned_loss=0.009777, audio_tagging_loss=0.009337, over 13920.00 frames. ], tot_loss[loss=0.06805, simple_loss=0.08982, pruned_loss=0.0123, audio_tagging_loss=0.01084, over 2375040.20 frames. ], batch size: 55, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 11:32:17,249 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=15.43 vs. limit=15.0 2023-11-26 11:32:23,433 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3368806.6666666665, ans=0.1 2023-11-26 11:32:25,060 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3368806.6666666665, ans=0.125 2023-11-26 11:32:50,428 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 505350 2023-11-26 11:32:54,041 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 350, loss[loss=0.08765, simple_loss=0.1146, pruned_loss=0.02133, audio_tagging_loss=0.009014, over 15349.00 frames. ], tot_loss[loss=0.0671, simple_loss=0.08912, pruned_loss=0.0122, audio_tagging_loss=0.01034, over 2519084.67 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 11:33:01,745 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3369006.6666666665, ans=0.1 2023-11-26 11:33:16,203 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3369140.0, ans=0.125 2023-11-26 11:33:26,934 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3369206.6666666665, ans=0.0 2023-11-26 11:33:35,047 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3369206.6666666665, ans=0.125 2023-11-26 11:33:35,777 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.517e+01 8.731e+01 9.201e+01 9.958e+01 1.413e+02, threshold=1.840e+02, percent-clipped=0.0 2023-11-26 11:33:41,658 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.23 vs. limit=15.0 2023-11-26 11:33:46,628 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 505400 2023-11-26 11:33:50,620 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 400, loss[loss=0.05794, simple_loss=0.07823, pruned_loss=0.009744, audio_tagging_loss=0.009079, over 16110.00 frames. ], tot_loss[loss=0.06667, simple_loss=0.0891, pruned_loss=0.01219, audio_tagging_loss=0.009931, over 2633545.19 frames. ], batch size: 62, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 11:34:33,297 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3369540.0, ans=0.0 2023-11-26 11:34:40,670 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3369606.6666666665, ans=0.125 2023-11-26 11:34:42,277 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.12 vs. limit=22.5 2023-11-26 11:34:43,263 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 505450 2023-11-26 11:34:46,996 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 450, loss[loss=0.0634, simple_loss=0.08741, pruned_loss=0.01067, audio_tagging_loss=0.009015, over 14786.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.08871, pruned_loss=0.01207, audio_tagging_loss=0.009724, over 2725978.30 frames. ], batch size: 54, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 11:34:51,452 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3369673.3333333335, ans=0.125 2023-11-26 11:35:03,625 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3369740.0, ans=0.125 2023-11-26 11:35:24,333 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3369873.3333333335, ans=0.1 2023-11-26 11:35:27,885 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.612e+01 8.871e+01 9.451e+01 1.026e+02 1.404e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-26 11:35:37,210 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3369940.0, ans=0.125 2023-11-26 11:35:39,138 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 505500 2023-11-26 11:35:42,265 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 500, loss[loss=0.07946, simple_loss=0.1034, pruned_loss=0.0178, audio_tagging_loss=0.009969, over 15161.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08746, pruned_loss=0.01197, audio_tagging_loss=0.009543, over 2800826.49 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 11:35:44,655 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3370006.6666666665, ans=0.125 2023-11-26 11:35:46,483 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.83 vs. limit=15.0 2023-11-26 11:36:05,877 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3370140.0, ans=0.1 2023-11-26 11:36:08,644 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3370140.0, ans=0.1 2023-11-26 11:36:19,987 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3370206.6666666665, ans=0.0 2023-11-26 11:36:24,511 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.81 vs. limit=15.0 2023-11-26 11:36:35,147 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 505550 2023-11-26 11:36:38,280 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 550, loss[loss=0.04119, simple_loss=0.04558, pruned_loss=0.006252, audio_tagging_loss=0.01215, over 14105.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.08912, pruned_loss=0.01221, audio_tagging_loss=0.009243, over 2853246.42 frames. ], batch size: 55, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 11:37:12,497 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.84 vs. limit=15.0 2023-11-26 11:37:19,740 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.474e+01 8.880e+01 9.459e+01 1.018e+02 1.226e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-26 11:37:24,914 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=3370606.6666666665, ans=6.0 2023-11-26 11:37:31,040 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 505600 2023-11-26 11:37:33,101 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.65 vs. limit=15.0 2023-11-26 11:37:34,737 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 600, loss[loss=0.07943, simple_loss=0.1043, pruned_loss=0.01981, audio_tagging_loss=0.00745, over 15846.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.08911, pruned_loss=0.01221, audio_tagging_loss=0.00918, over 2902035.96 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 11:37:37,662 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3370673.3333333335, ans=0.2 2023-11-26 11:38:17,415 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3370873.3333333335, ans=0.2 2023-11-26 11:38:23,377 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3370940.0, ans=0.2 2023-11-26 11:38:27,355 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 505650 2023-11-26 11:38:27,759 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.23 vs. limit=15.0 2023-11-26 11:38:30,490 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 650, loss[loss=0.06967, simple_loss=0.09775, pruned_loss=0.01214, audio_tagging_loss=0.008656, over 14758.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.08923, pruned_loss=0.01217, audio_tagging_loss=0.009206, over 2934210.62 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 11:38:31,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3371006.6666666665, ans=0.05 2023-11-26 11:38:40,371 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 11:38:47,558 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.55 vs. limit=12.0 2023-11-26 11:38:49,987 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3371073.3333333335, ans=0.0 2023-11-26 11:38:52,118 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3371140.0, ans=0.125 2023-11-26 11:39:12,094 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.982e+01 8.620e+01 9.335e+01 1.001e+02 1.278e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-26 11:39:22,911 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 505700 2023-11-26 11:39:25,215 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3371340.0, ans=0.125 2023-11-26 11:39:26,076 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 700, loss[loss=0.05799, simple_loss=0.08311, pruned_loss=0.01004, audio_tagging_loss=0.006405, over 14759.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.08907, pruned_loss=0.01221, audio_tagging_loss=0.009126, over 2953569.25 frames. ], batch size: 55, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 11:39:28,985 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3371340.0, ans=0.1 2023-11-26 11:40:19,241 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 505750 2023-11-26 11:40:22,311 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 750, loss[loss=0.06139, simple_loss=0.0834, pruned_loss=0.01034, audio_tagging_loss=0.009357, over 14126.00 frames. ], tot_loss[loss=0.066, simple_loss=0.08945, pruned_loss=0.0122, audio_tagging_loss=0.009069, over 2970769.52 frames. ], batch size: 53, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 11:40:49,610 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.58 vs. limit=15.0 2023-11-26 11:40:51,177 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3371806.6666666665, ans=0.125 2023-11-26 11:40:54,476 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3371873.3333333335, ans=0.1 2023-11-26 11:41:01,868 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3371873.3333333335, ans=0.1 2023-11-26 11:41:03,803 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.272e+01 8.569e+01 9.106e+01 9.803e+01 1.361e+02, threshold=1.821e+02, percent-clipped=0.0 2023-11-26 11:41:15,672 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 505800 2023-11-26 11:41:19,120 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 800, loss[loss=0.06118, simple_loss=0.0819, pruned_loss=0.01007, audio_tagging_loss=0.01016, over 14427.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.08946, pruned_loss=0.01219, audio_tagging_loss=0.009057, over 2988247.24 frames. ], batch size: 55, lr: 1.58e-03, grad_scale: 32.0 2023-11-26 11:41:19,369 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3372006.6666666665, ans=0.125 2023-11-26 11:41:19,428 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3372006.6666666665, ans=0.125 2023-11-26 11:41:23,721 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3372006.6666666665, ans=0.0 2023-11-26 11:41:25,841 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3372006.6666666665, ans=0.125 2023-11-26 11:41:28,995 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3372073.3333333335, ans=0.0 2023-11-26 11:41:47,988 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3372140.0, ans=0.1 2023-11-26 11:41:50,848 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3372140.0, ans=0.1 2023-11-26 11:42:11,419 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 505850 2023-11-26 11:42:14,558 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 850, loss[loss=0.05371, simple_loss=0.06844, pruned_loss=0.009487, audio_tagging_loss=0.01, over 14830.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08919, pruned_loss=0.01211, audio_tagging_loss=0.009162, over 3004074.93 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 32.0 2023-11-26 11:42:23,759 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3372340.0, ans=0.0 2023-11-26 11:42:26,827 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.82 vs. limit=22.5 2023-11-26 11:42:40,383 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3372473.3333333335, ans=0.0 2023-11-26 11:42:49,726 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3372540.0, ans=0.1 2023-11-26 11:42:55,154 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3372540.0, ans=0.0 2023-11-26 11:42:55,992 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.462e+01 8.767e+01 9.422e+01 1.006e+02 1.207e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-26 11:42:57,279 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3372540.0, ans=0.0 2023-11-26 11:43:07,284 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 505900 2023-11-26 11:43:10,184 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3372673.3333333335, ans=0.0 2023-11-26 11:43:11,000 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 900, loss[loss=0.07388, simple_loss=0.1071, pruned_loss=0.01416, audio_tagging_loss=0.006166, over 15254.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.08951, pruned_loss=0.0122, audio_tagging_loss=0.009124, over 3013960.60 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 32.0 2023-11-26 11:43:11,207 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3372673.3333333335, ans=0.0 2023-11-26 11:43:15,030 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3372673.3333333335, ans=0.0 2023-11-26 11:43:27,375 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3372740.0, ans=0.125 2023-11-26 11:43:27,715 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.00 vs. limit=15.0 2023-11-26 11:43:42,746 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.64 vs. limit=12.0 2023-11-26 11:44:04,585 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 505950 2023-11-26 11:44:05,740 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3372940.0, ans=0.1 2023-11-26 11:44:06,812 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3373006.6666666665, ans=0.2 2023-11-26 11:44:07,739 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 950, loss[loss=0.06654, simple_loss=0.09976, pruned_loss=0.009974, audio_tagging_loss=0.006686, over 14947.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08912, pruned_loss=0.01203, audio_tagging_loss=0.009063, over 3022454.81 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 32.0 2023-11-26 11:44:16,553 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.41 vs. limit=22.5 2023-11-26 11:44:33,207 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=3373140.0, ans=22.5 2023-11-26 11:44:48,916 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.455e+01 8.709e+01 9.307e+01 9.774e+01 1.284e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-26 11:44:59,659 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 506000 2023-11-26 11:44:59,893 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3373273.3333333335, ans=0.125 2023-11-26 11:45:03,182 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 1000, loss[loss=0.05925, simple_loss=0.08098, pruned_loss=0.008793, audio_tagging_loss=0.009969, over 15647.00 frames. ], tot_loss[loss=0.06501, simple_loss=0.0882, pruned_loss=0.01191, audio_tagging_loss=0.009001, over 3024334.44 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 32.0 2023-11-26 11:45:07,673 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3373340.0, ans=0.125 2023-11-26 11:45:27,326 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 11:45:30,757 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3373473.3333333335, ans=0.2 2023-11-26 11:45:35,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3373540.0, ans=0.125 2023-11-26 11:45:37,128 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3373540.0, ans=0.0 2023-11-26 11:45:43,471 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3373540.0, ans=0.125 2023-11-26 11:45:55,487 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 506050 2023-11-26 11:45:58,637 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 1050, loss[loss=0.05633, simple_loss=0.07837, pruned_loss=0.008806, audio_tagging_loss=0.008342, over 14271.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08897, pruned_loss=0.01198, audio_tagging_loss=0.008829, over 3026845.51 frames. ], batch size: 54, lr: 1.58e-03, grad_scale: 32.0 2023-11-26 11:46:03,103 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3373673.3333333335, ans=0.0 2023-11-26 11:46:03,172 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3373673.3333333335, ans=0.125 2023-11-26 11:46:11,666 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.68 vs. limit=15.0 2023-11-26 11:46:28,280 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3373806.6666666665, ans=0.125 2023-11-26 11:46:36,276 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3373873.3333333335, ans=0.2 2023-11-26 11:46:40,309 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.136e+01 8.798e+01 9.171e+01 9.764e+01 1.249e+02, threshold=1.834e+02, percent-clipped=0.0 2023-11-26 11:46:48,617 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3373940.0, ans=0.0 2023-11-26 11:46:50,304 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3373940.0, ans=0.2 2023-11-26 11:46:51,096 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3373940.0, ans=0.125 2023-11-26 11:46:52,178 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 506100 2023-11-26 11:46:55,361 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 1100, loss[loss=0.05035, simple_loss=0.06096, pruned_loss=0.01101, audio_tagging_loss=0.008856, over 15252.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.089, pruned_loss=0.01204, audio_tagging_loss=0.00872, over 3028730.50 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 32.0 2023-11-26 11:46:57,480 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 11:47:04,045 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3374006.6666666665, ans=0.0 2023-11-26 11:47:24,718 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3374140.0, ans=0.125 2023-11-26 11:47:42,361 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 11:47:47,590 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 506150 2023-11-26 11:47:50,751 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 1150, loss[loss=0.06567, simple_loss=0.08934, pruned_loss=0.01256, audio_tagging_loss=0.008437, over 14684.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08926, pruned_loss=0.01201, audio_tagging_loss=0.008747, over 3029920.32 frames. ], batch size: 55, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 11:47:54,230 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3374340.0, ans=0.5 2023-11-26 11:47:58,513 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3374340.0, ans=0.125 2023-11-26 11:48:13,468 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3374473.3333333335, ans=0.0 2023-11-26 11:48:18,158 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.00 vs. limit=22.5 2023-11-26 11:48:27,512 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3374540.0, ans=0.2 2023-11-26 11:48:33,720 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.518e+01 8.776e+01 9.671e+01 1.060e+02 1.290e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-26 11:48:38,622 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.86 vs. limit=15.0 2023-11-26 11:48:43,886 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 506200 2023-11-26 11:48:47,305 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 1200, loss[loss=0.07096, simple_loss=0.1028, pruned_loss=0.01373, audio_tagging_loss=0.005823, over 14881.00 frames. ], tot_loss[loss=0.06478, simple_loss=0.08825, pruned_loss=0.01186, audio_tagging_loss=0.008792, over 3022006.49 frames. ], batch size: 55, lr: 1.58e-03, grad_scale: 32.0 2023-11-26 11:48:52,778 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3374673.3333333335, ans=0.1 2023-11-26 11:48:55,467 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3374673.3333333335, ans=0.125 2023-11-26 11:48:55,512 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3374673.3333333335, ans=0.125 2023-11-26 11:49:03,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3374740.0, ans=0.125 2023-11-26 11:49:40,143 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 506250 2023-11-26 11:49:41,963 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3374940.0, ans=0.07 2023-11-26 11:49:43,820 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 1250, loss[loss=0.07709, simple_loss=0.1005, pruned_loss=0.01654, audio_tagging_loss=0.01027, over 15546.00 frames. ], tot_loss[loss=0.06458, simple_loss=0.08795, pruned_loss=0.01184, audio_tagging_loss=0.008764, over 3027617.59 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 32.0 2023-11-26 11:49:54,295 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3375073.3333333335, ans=0.0 2023-11-26 11:49:55,285 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3375073.3333333335, ans=0.2 2023-11-26 11:50:00,563 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3375073.3333333335, ans=0.125 2023-11-26 11:50:12,855 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3375140.0, ans=0.125 2023-11-26 11:50:26,479 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.041e+01 8.620e+01 9.220e+01 9.934e+01 1.296e+02, threshold=1.844e+02, percent-clipped=0.0 2023-11-26 11:50:36,712 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 506300 2023-11-26 11:50:39,855 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 1300, loss[loss=0.06288, simple_loss=0.07905, pruned_loss=0.01182, audio_tagging_loss=0.01153, over 15984.00 frames. ], tot_loss[loss=0.06443, simple_loss=0.08778, pruned_loss=0.01174, audio_tagging_loss=0.0088, over 3026755.71 frames. ], batch size: 59, lr: 1.58e-03, grad_scale: 32.0 2023-11-26 11:50:50,929 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3375406.6666666665, ans=0.125 2023-11-26 11:51:06,973 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3375473.3333333335, ans=0.125 2023-11-26 11:51:08,632 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3375473.3333333335, ans=0.125 2023-11-26 11:51:16,421 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=3375540.0, ans=0.05 2023-11-26 11:51:32,264 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 506350 2023-11-26 11:51:33,606 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3375606.6666666665, ans=0.0 2023-11-26 11:51:36,028 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 1350, loss[loss=0.05668, simple_loss=0.0738, pruned_loss=0.01056, audio_tagging_loss=0.009217, over 15702.00 frames. ], tot_loss[loss=0.0647, simple_loss=0.088, pruned_loss=0.01187, audio_tagging_loss=0.008826, over 3033030.38 frames. ], batch size: 59, lr: 1.58e-03, grad_scale: 32.0 2023-11-26 11:51:40,525 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3375673.3333333335, ans=0.0 2023-11-26 11:51:44,754 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3375673.3333333335, ans=0.125 2023-11-26 11:52:04,284 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3375806.6666666665, ans=0.0 2023-11-26 11:52:06,676 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.62 vs. limit=15.0 2023-11-26 11:52:15,826 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 11:52:18,710 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3375873.3333333335, ans=0.0 2023-11-26 11:52:19,583 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.306e+01 8.796e+01 9.351e+01 9.969e+01 1.266e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-26 11:52:28,746 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 506400 2023-11-26 11:52:32,102 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 1400, loss[loss=0.08027, simple_loss=0.1054, pruned_loss=0.01808, audio_tagging_loss=0.009468, over 14534.00 frames. ], tot_loss[loss=0.0647, simple_loss=0.08814, pruned_loss=0.01182, audio_tagging_loss=0.008804, over 3032602.33 frames. ], batch size: 54, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 11:52:41,488 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3376006.6666666665, ans=0.125 2023-11-26 11:52:41,498 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3376006.6666666665, ans=0.125 2023-11-26 11:52:41,762 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.35 vs. limit=15.0 2023-11-26 11:52:54,754 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3376140.0, ans=0.2 2023-11-26 11:53:09,634 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.17 vs. limit=15.0 2023-11-26 11:53:20,411 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.17 vs. limit=22.5 2023-11-26 11:53:22,197 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.85 vs. limit=10.0 2023-11-26 11:53:25,000 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 506450 2023-11-26 11:53:28,648 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 1450, loss[loss=0.07347, simple_loss=0.1045, pruned_loss=0.01174, audio_tagging_loss=0.009501, over 14809.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.08866, pruned_loss=0.01194, audio_tagging_loss=0.008921, over 3036215.73 frames. ], batch size: 55, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 11:53:32,344 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.75 vs. limit=15.0 2023-11-26 11:54:03,286 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3376540.0, ans=0.125 2023-11-26 11:54:07,516 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3376540.0, ans=0.2 2023-11-26 11:54:12,032 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.549e+01 8.794e+01 9.262e+01 1.013e+02 1.743e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-26 11:54:20,664 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 506500 2023-11-26 11:54:23,740 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 1500, loss[loss=0.05501, simple_loss=0.0708, pruned_loss=0.006464, audio_tagging_loss=0.01315, over 16528.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08835, pruned_loss=0.01201, audio_tagging_loss=0.008959, over 3036744.39 frames. ], batch size: 65, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 11:54:47,824 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3376806.6666666665, ans=0.1 2023-11-26 11:54:55,859 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3376806.6666666665, ans=0.07 2023-11-26 11:55:07,641 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3376940.0, ans=0.0 2023-11-26 11:55:16,599 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 506550 2023-11-26 11:55:20,269 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 1550, loss[loss=0.05505, simple_loss=0.07479, pruned_loss=0.006877, audio_tagging_loss=0.01078, over 14487.00 frames. ], tot_loss[loss=0.06504, simple_loss=0.08796, pruned_loss=0.01201, audio_tagging_loss=0.00904, over 3038688.19 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 4.0 2023-11-26 11:55:35,867 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3377073.3333333335, ans=0.125 2023-11-26 11:55:42,834 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3377140.0, ans=0.2 2023-11-26 11:55:48,159 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3377140.0, ans=0.0 2023-11-26 11:55:54,972 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3377206.6666666665, ans=0.125 2023-11-26 11:56:06,472 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.131e+01 8.914e+01 9.617e+01 1.050e+02 1.576e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-26 11:56:13,039 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 506600 2023-11-26 11:56:16,423 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 1600, loss[loss=0.07105, simple_loss=0.08878, pruned_loss=0.0144, audio_tagging_loss=0.01226, over 15143.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08832, pruned_loss=0.01215, audio_tagging_loss=0.009057, over 3037296.51 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 11:56:21,950 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3377340.0, ans=0.2 2023-11-26 11:56:39,885 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3377473.3333333335, ans=0.125 2023-11-26 11:56:55,039 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3377540.0, ans=0.09899494936611666 2023-11-26 11:57:06,838 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3377606.6666666665, ans=0.5 2023-11-26 11:57:09,316 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 506650 2023-11-26 11:57:11,495 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3377673.3333333335, ans=0.125 2023-11-26 11:57:12,410 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 1650, loss[loss=0.08522, simple_loss=0.1204, pruned_loss=0.01801, audio_tagging_loss=0.007015, over 15000.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08828, pruned_loss=0.01215, audio_tagging_loss=0.009089, over 3041490.25 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 11:57:27,623 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 11:57:28,795 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 11:57:41,178 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3377806.6666666665, ans=0.125 2023-11-26 11:57:44,779 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.97 vs. limit=15.0 2023-11-26 11:57:58,843 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.445e+01 8.844e+01 9.530e+01 1.032e+02 1.539e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-26 11:58:05,851 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 506700 2023-11-26 11:58:09,011 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 1700, loss[loss=0.06463, simple_loss=0.08479, pruned_loss=0.01098, audio_tagging_loss=0.01126, over 15510.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.08765, pruned_loss=0.01214, audio_tagging_loss=0.009118, over 3046005.15 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 11:58:28,399 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3378073.3333333335, ans=0.125 2023-11-26 11:58:50,681 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3378206.6666666665, ans=0.0 2023-11-26 11:58:55,483 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3378273.3333333335, ans=0.0 2023-11-26 11:59:02,200 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 506750 2023-11-26 11:59:05,380 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 1750, loss[loss=0.07607, simple_loss=0.1014, pruned_loss=0.01768, audio_tagging_loss=0.007681, over 14646.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08862, pruned_loss=0.01226, audio_tagging_loss=0.008961, over 3047713.03 frames. ], batch size: 55, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 11:59:05,622 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3378340.0, ans=0.125 2023-11-26 11:59:07,999 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.16 vs. limit=6.0 2023-11-26 11:59:13,512 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.93 vs. limit=15.0 2023-11-26 11:59:30,172 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3378473.3333333335, ans=0.0 2023-11-26 11:59:32,343 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3378473.3333333335, ans=0.125 2023-11-26 11:59:37,378 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.53 vs. limit=10.0 2023-11-26 11:59:51,255 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.129e+01 8.598e+01 9.201e+01 1.005e+02 1.270e+02, threshold=1.840e+02, percent-clipped=0.0 2023-11-26 11:59:57,757 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 506800 2023-11-26 12:00:01,108 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 1800, loss[loss=0.05566, simple_loss=0.06354, pruned_loss=0.01477, audio_tagging_loss=0.009121, over 14887.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.08922, pruned_loss=0.01237, audio_tagging_loss=0.008834, over 3044020.81 frames. ], batch size: 59, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 12:00:05,375 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=3378673.3333333335, ans=10.0 2023-11-26 12:00:15,397 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.22 vs. limit=15.0 2023-11-26 12:00:18,589 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3378740.0, ans=0.125 2023-11-26 12:00:19,701 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3378740.0, ans=0.125 2023-11-26 12:00:23,787 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.26 vs. limit=22.5 2023-11-26 12:00:54,433 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 506850 2023-11-26 12:00:57,541 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 1850, loss[loss=0.0543, simple_loss=0.06884, pruned_loss=0.009697, audio_tagging_loss=0.01018, over 15290.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.09005, pruned_loss=0.01236, audio_tagging_loss=0.008741, over 3044819.31 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 12:01:06,193 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3379006.6666666665, ans=0.0 2023-11-26 12:01:43,739 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.338e+01 8.799e+01 9.499e+01 1.025e+02 1.305e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-26 12:01:50,839 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 506900 2023-11-26 12:01:53,963 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 1900, loss[loss=0.06857, simple_loss=0.09076, pruned_loss=0.01167, audio_tagging_loss=0.01152, over 15322.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.08958, pruned_loss=0.01227, audio_tagging_loss=0.008731, over 3040608.84 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 12:02:17,123 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3379473.3333333335, ans=0.125 2023-11-26 12:02:32,634 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3379540.0, ans=0.125 2023-11-26 12:02:35,628 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3379540.0, ans=0.125 2023-11-26 12:02:36,045 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.13 vs. limit=15.0 2023-11-26 12:02:38,689 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3379606.6666666665, ans=0.0 2023-11-26 12:02:46,124 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 506950 2023-11-26 12:02:49,331 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 1950, loss[loss=0.08149, simple_loss=0.1148, pruned_loss=0.01621, audio_tagging_loss=0.007883, over 14930.00 frames. ], tot_loss[loss=0.06522, simple_loss=0.08861, pruned_loss=0.01214, audio_tagging_loss=0.008772, over 3038798.29 frames. ], batch size: 53, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 12:02:49,632 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3379673.3333333335, ans=0.1 2023-11-26 12:03:24,875 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3379873.3333333335, ans=0.0 2023-11-26 12:03:31,848 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.67 vs. limit=10.0 2023-11-26 12:03:35,544 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.070e+01 8.710e+01 9.475e+01 1.012e+02 2.962e+02, threshold=1.895e+02, percent-clipped=1.0 2023-11-26 12:03:42,557 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 507000 2023-11-26 12:03:45,962 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 2000, loss[loss=0.05143, simple_loss=0.06884, pruned_loss=0.008088, audio_tagging_loss=0.008923, over 14081.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08915, pruned_loss=0.01229, audio_tagging_loss=0.008711, over 3040011.53 frames. ], batch size: 54, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:03:53,101 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.28 vs. limit=15.0 2023-11-26 12:03:59,497 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3380073.3333333335, ans=0.0 2023-11-26 12:04:03,563 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3380073.3333333335, ans=0.07 2023-11-26 12:04:08,127 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.06 vs. limit=15.0 2023-11-26 12:04:37,124 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3380273.3333333335, ans=0.0 2023-11-26 12:04:39,503 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 507050 2023-11-26 12:04:42,091 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.13 vs. limit=15.0 2023-11-26 12:04:42,698 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 2050, loss[loss=0.07873, simple_loss=0.1093, pruned_loss=0.01352, audio_tagging_loss=0.01054, over 16617.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.08972, pruned_loss=0.01236, audio_tagging_loss=0.008636, over 3040709.57 frames. ], batch size: 61, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:04:46,434 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=3380340.0, ans=15.0 2023-11-26 12:04:51,405 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3380340.0, ans=10.0 2023-11-26 12:05:01,002 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3380406.6666666665, ans=0.0 2023-11-26 12:05:05,276 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3380473.3333333335, ans=0.1 2023-11-26 12:05:19,530 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3380540.0, ans=0.125 2023-11-26 12:05:28,416 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.082e+01 8.859e+01 9.273e+01 1.013e+02 1.302e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-26 12:05:34,983 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 507100 2023-11-26 12:05:38,118 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 2100, loss[loss=0.07952, simple_loss=0.1091, pruned_loss=0.01699, audio_tagging_loss=0.007962, over 15446.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.08925, pruned_loss=0.01229, audio_tagging_loss=0.008653, over 3040602.76 frames. ], batch size: 55, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:05:48,881 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3380740.0, ans=0.125 2023-11-26 12:06:12,453 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3380873.3333333335, ans=0.09899494936611666 2023-11-26 12:06:13,410 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3380873.3333333335, ans=0.1 2023-11-26 12:06:19,837 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3380873.3333333335, ans=0.125 2023-11-26 12:06:20,923 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3380873.3333333335, ans=0.125 2023-11-26 12:06:22,988 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3380940.0, ans=0.125 2023-11-26 12:06:30,270 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 507150 2023-11-26 12:06:33,901 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 2150, loss[loss=0.0758, simple_loss=0.1042, pruned_loss=0.01828, audio_tagging_loss=0.005434, over 15325.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.09046, pruned_loss=0.01247, audio_tagging_loss=0.008508, over 3037838.38 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:06:50,828 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3381073.3333333335, ans=0.125 2023-11-26 12:06:51,731 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3381073.3333333335, ans=0.125 2023-11-26 12:06:52,962 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3381073.3333333335, ans=0.125 2023-11-26 12:06:58,172 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3381140.0, ans=0.125 2023-11-26 12:07:07,056 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.89 vs. limit=15.0 2023-11-26 12:07:07,564 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 12:07:15,599 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3381206.6666666665, ans=0.125 2023-11-26 12:07:19,711 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.860e+01 8.634e+01 9.242e+01 1.004e+02 1.355e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-26 12:07:19,957 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3381273.3333333335, ans=0.0 2023-11-26 12:07:26,696 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 507200 2023-11-26 12:07:30,661 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 2200, loss[loss=0.04589, simple_loss=0.06023, pruned_loss=0.00618, audio_tagging_loss=0.009594, over 14569.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.09018, pruned_loss=0.01238, audio_tagging_loss=0.008619, over 3034724.04 frames. ], batch size: 54, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:07:33,993 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3381340.0, ans=0.2 2023-11-26 12:07:46,816 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3381406.6666666665, ans=0.0 2023-11-26 12:07:50,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3381406.6666666665, ans=0.125 2023-11-26 12:07:52,326 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3381473.3333333335, ans=0.07 2023-11-26 12:08:18,867 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3381606.6666666665, ans=0.125 2023-11-26 12:08:23,010 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 507250 2023-11-26 12:08:26,157 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 2250, loss[loss=0.07541, simple_loss=0.1141, pruned_loss=0.01317, audio_tagging_loss=0.005198, over 16572.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.09043, pruned_loss=0.0123, audio_tagging_loss=0.008655, over 3030779.89 frames. ], batch size: 59, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:08:32,867 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3381673.3333333335, ans=0.2 2023-11-26 12:08:42,460 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3381740.0, ans=0.125 2023-11-26 12:08:55,547 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.66 vs. limit=10.0 2023-11-26 12:08:59,442 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.94 vs. limit=15.0 2023-11-26 12:09:11,481 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.618e+01 8.710e+01 9.301e+01 1.010e+02 1.448e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-26 12:09:17,935 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 507300 2023-11-26 12:09:21,642 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 2300, loss[loss=0.07318, simple_loss=0.1099, pruned_loss=0.01114, audio_tagging_loss=0.007095, over 14977.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.09088, pruned_loss=0.0124, audio_tagging_loss=0.008721, over 3038089.62 frames. ], batch size: 55, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:09:21,903 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3382006.6666666665, ans=0.125 2023-11-26 12:09:41,348 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=3382073.3333333335, ans=15.0 2023-11-26 12:09:48,981 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3382140.0, ans=0.125 2023-11-26 12:10:10,776 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 12:10:14,595 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 507350 2023-11-26 12:10:17,690 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 2350, loss[loss=0.08133, simple_loss=0.116, pruned_loss=0.01774, audio_tagging_loss=0.005602, over 16087.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.09024, pruned_loss=0.01243, audio_tagging_loss=0.008817, over 3037660.91 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:10:23,078 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.15 vs. limit=22.5 2023-11-26 12:10:30,941 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3382406.6666666665, ans=0.2 2023-11-26 12:10:53,704 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3382540.0, ans=0.2 2023-11-26 12:10:54,587 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3382540.0, ans=0.1 2023-11-26 12:11:01,660 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3382606.6666666665, ans=0.125 2023-11-26 12:11:03,684 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.81 vs. limit=15.0 2023-11-26 12:11:04,120 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.454e+01 8.890e+01 9.561e+01 1.021e+02 1.290e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-26 12:11:08,990 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.07 vs. limit=15.0 2023-11-26 12:11:11,220 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 507400 2023-11-26 12:11:14,681 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 2400, loss[loss=0.05579, simple_loss=0.07068, pruned_loss=0.0111, audio_tagging_loss=0.009354, over 15036.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.08975, pruned_loss=0.01237, audio_tagging_loss=0.008986, over 3035888.20 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 32.0 2023-11-26 12:11:25,553 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3382740.0, ans=0.2 2023-11-26 12:11:27,033 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.25 vs. limit=15.0 2023-11-26 12:11:56,383 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.46 vs. limit=6.0 2023-11-26 12:11:59,374 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3382940.0, ans=0.0 2023-11-26 12:12:06,705 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 507450 2023-11-26 12:12:09,771 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 2450, loss[loss=0.069, simple_loss=0.09593, pruned_loss=0.01441, audio_tagging_loss=0.006629, over 14719.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.08889, pruned_loss=0.01227, audio_tagging_loss=0.00907, over 3028975.51 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:12:10,359 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.76 vs. limit=12.0 2023-11-26 12:12:16,095 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.61 vs. limit=15.0 2023-11-26 12:12:52,637 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3383206.6666666665, ans=0.125 2023-11-26 12:12:57,199 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.415e+01 8.728e+01 9.307e+01 9.934e+01 1.225e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-26 12:13:01,573 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3383273.3333333335, ans=0.0 2023-11-26 12:13:02,386 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 507500 2023-11-26 12:13:05,181 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3383340.0, ans=0.125 2023-11-26 12:13:06,024 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 2500, loss[loss=0.06244, simple_loss=0.09231, pruned_loss=0.01054, audio_tagging_loss=0.005737, over 15027.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.089, pruned_loss=0.01214, audio_tagging_loss=0.009065, over 3038648.43 frames. ], batch size: 55, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:13:06,659 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.45 vs. limit=6.0 2023-11-26 12:13:13,157 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3383340.0, ans=0.125 2023-11-26 12:13:19,627 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3383406.6666666665, ans=0.125 2023-11-26 12:13:46,294 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3383540.0, ans=0.2 2023-11-26 12:13:55,204 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.93 vs. limit=22.5 2023-11-26 12:13:59,021 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 507550 2023-11-26 12:14:02,166 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 2550, loss[loss=0.06432, simple_loss=0.08711, pruned_loss=0.01064, audio_tagging_loss=0.01013, over 15129.00 frames. ], tot_loss[loss=0.0663, simple_loss=0.09012, pruned_loss=0.01232, audio_tagging_loss=0.008922, over 3041178.55 frames. ], batch size: 59, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:14:11,460 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3383673.3333333335, ans=0.125 2023-11-26 12:14:14,695 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3383740.0, ans=0.0 2023-11-26 12:14:39,901 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3383873.3333333335, ans=0.1 2023-11-26 12:14:48,280 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3383940.0, ans=0.1 2023-11-26 12:14:49,149 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.468e+01 8.661e+01 9.276e+01 1.004e+02 1.739e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-26 12:14:54,971 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 507600 2023-11-26 12:14:58,343 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 2600, loss[loss=0.05869, simple_loss=0.07102, pruned_loss=0.01318, audio_tagging_loss=0.01, over 13686.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08917, pruned_loss=0.01219, audio_tagging_loss=0.008816, over 3040380.28 frames. ], batch size: 55, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:14:59,673 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3384006.6666666665, ans=0.1 2023-11-26 12:15:14,624 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3384073.3333333335, ans=0.1 2023-11-26 12:15:28,914 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3384140.0, ans=0.1 2023-11-26 12:15:30,014 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3384140.0, ans=0.0 2023-11-26 12:15:44,450 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3384273.3333333335, ans=0.125 2023-11-26 12:15:51,092 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 507650 2023-11-26 12:15:51,426 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.54 vs. limit=22.5 2023-11-26 12:15:54,229 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 2650, loss[loss=0.0695, simple_loss=0.1015, pruned_loss=0.01074, audio_tagging_loss=0.008009, over 15859.00 frames. ], tot_loss[loss=0.06662, simple_loss=0.09063, pruned_loss=0.01256, audio_tagging_loss=0.008739, over 3041165.97 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 12:15:55,729 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.59 vs. limit=15.0 2023-11-26 12:16:07,263 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3384406.6666666665, ans=0.0 2023-11-26 12:16:39,400 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3384606.6666666665, ans=0.125 2023-11-26 12:16:42,421 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.481e+01 8.705e+01 9.342e+01 1.013e+02 1.276e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-26 12:16:47,308 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 507700 2023-11-26 12:16:50,501 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 2700, loss[loss=0.05708, simple_loss=0.07697, pruned_loss=0.01011, audio_tagging_loss=0.008485, over 14721.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.09022, pruned_loss=0.01259, audio_tagging_loss=0.008683, over 3044986.56 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 12:16:57,111 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3384673.3333333335, ans=0.125 2023-11-26 12:17:17,381 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3384806.6666666665, ans=0.125 2023-11-26 12:17:22,009 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3384806.6666666665, ans=0.0 2023-11-26 12:17:32,187 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3384873.3333333335, ans=0.95 2023-11-26 12:17:42,634 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 507750 2023-11-26 12:17:42,766 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3384940.0, ans=0.1 2023-11-26 12:17:45,894 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 2750, loss[loss=0.05271, simple_loss=0.06644, pruned_loss=0.00681, audio_tagging_loss=0.01268, over 14443.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.0905, pruned_loss=0.01259, audio_tagging_loss=0.008709, over 3036181.45 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 12:18:02,138 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3385073.3333333335, ans=0.1 2023-11-26 12:18:14,722 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3385140.0, ans=10.0 2023-11-26 12:18:16,810 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3385140.0, ans=0.0 2023-11-26 12:18:25,626 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.77 vs. limit=15.0 2023-11-26 12:18:29,033 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.58 vs. limit=15.0 2023-11-26 12:18:34,404 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.630e+01 8.797e+01 9.310e+01 1.006e+02 1.204e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-26 12:18:34,906 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.43 vs. limit=15.0 2023-11-26 12:18:36,029 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 12:18:36,277 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3385273.3333333335, ans=0.2 2023-11-26 12:18:39,239 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 507800 2023-11-26 12:18:42,662 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 2800, loss[loss=0.04528, simple_loss=0.06199, pruned_loss=0.004806, audio_tagging_loss=0.009475, over 15924.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.0897, pruned_loss=0.01232, audio_tagging_loss=0.008683, over 3033333.03 frames. ], batch size: 63, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:18:55,794 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3385406.6666666665, ans=0.1 2023-11-26 12:19:14,847 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.92 vs. limit=12.0 2023-11-26 12:19:16,962 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.03 vs. limit=10.0 2023-11-26 12:19:33,691 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3385606.6666666665, ans=0.125 2023-11-26 12:19:36,156 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 507850 2023-11-26 12:19:39,347 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 2850, loss[loss=0.06194, simple_loss=0.08325, pruned_loss=0.01126, audio_tagging_loss=0.009058, over 15473.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.09014, pruned_loss=0.01231, audio_tagging_loss=0.008612, over 3039518.32 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:19:52,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3385740.0, ans=0.04949747468305833 2023-11-26 12:19:56,546 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3385740.0, ans=0.1 2023-11-26 12:20:04,191 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3385806.6666666665, ans=0.125 2023-11-26 12:20:05,297 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3385806.6666666665, ans=0.125 2023-11-26 12:20:08,533 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3385806.6666666665, ans=0.0 2023-11-26 12:20:20,122 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3385873.3333333335, ans=0.0 2023-11-26 12:20:21,193 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3385873.3333333335, ans=0.0 2023-11-26 12:20:28,302 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.236e+01 8.713e+01 9.303e+01 9.917e+01 1.324e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-26 12:20:31,701 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 507900 2023-11-26 12:20:34,806 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 2900, loss[loss=0.05756, simple_loss=0.07258, pruned_loss=0.0114, audio_tagging_loss=0.009872, over 13887.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08964, pruned_loss=0.01219, audio_tagging_loss=0.008633, over 3039820.44 frames. ], batch size: 54, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 12:20:37,195 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3386006.6666666665, ans=0.125 2023-11-26 12:20:41,496 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3386006.6666666665, ans=0.125 2023-11-26 12:20:59,208 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3386140.0, ans=0.125 2023-11-26 12:21:00,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3386140.0, ans=0.125 2023-11-26 12:21:11,467 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3386206.6666666665, ans=0.125 2023-11-26 12:21:11,767 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.21 vs. limit=15.0 2023-11-26 12:21:27,878 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 507950 2023-11-26 12:21:31,526 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 2950, loss[loss=0.05616, simple_loss=0.07249, pruned_loss=0.01267, audio_tagging_loss=0.007249, over 14594.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.09014, pruned_loss=0.01226, audio_tagging_loss=0.008706, over 3033810.14 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 12:21:35,898 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3386340.0, ans=0.2 2023-11-26 12:21:40,789 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3386340.0, ans=0.0 2023-11-26 12:21:50,766 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3386406.6666666665, ans=0.125 2023-11-26 12:22:02,347 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3386473.3333333335, ans=0.04949747468305833 2023-11-26 12:22:20,945 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.213e+01 8.672e+01 9.532e+01 9.988e+01 1.402e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-26 12:22:24,224 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 508000 2023-11-26 12:22:25,516 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-508000.pt 2023-11-26 12:22:30,158 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 3000, loss[loss=0.06705, simple_loss=0.09515, pruned_loss=0.01279, audio_tagging_loss=0.006683, over 15590.00 frames. ], tot_loss[loss=0.06589, simple_loss=0.08985, pruned_loss=0.01223, audio_tagging_loss=0.008734, over 3037648.20 frames. ], batch size: 60, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 12:22:30,161 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-26 12:22:45,398 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.1554, 4.5371, 5.2449, 4.9046], device='cuda:0') 2023-11-26 12:22:55,083 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.8055, 5.8333, 5.8994, 5.9031], device='cuda:0') 2023-11-26 12:23:02,719 INFO [train_asr.py:1267] (0/4) Epoch 43, validation: loss=0.05754, simple_loss=0.05056, pruned_loss=0.00524, audio_tagging_loss=0.02702, over 4681554.00 frames. 2023-11-26 12:23:02,720 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-26 12:23:04,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3386673.3333333335, ans=0.1 2023-11-26 12:23:05,578 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.79 vs. limit=15.0 2023-11-26 12:23:13,793 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.64 vs. limit=10.0 2023-11-26 12:23:23,513 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3386740.0, ans=0.125 2023-11-26 12:23:27,209 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3386806.6666666665, ans=0.125 2023-11-26 12:23:33,838 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.15 vs. limit=15.0 2023-11-26 12:23:47,453 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3386940.0, ans=0.1 2023-11-26 12:23:55,923 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 508050 2023-11-26 12:23:58,983 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 3050, loss[loss=0.06462, simple_loss=0.09158, pruned_loss=0.0118, audio_tagging_loss=0.007034, over 15097.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.09038, pruned_loss=0.01236, audio_tagging_loss=0.008827, over 3036761.94 frames. ], batch size: 55, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 12:24:12,182 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3387073.3333333335, ans=0.125 2023-11-26 12:24:24,042 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.38 vs. limit=12.0 2023-11-26 12:24:32,801 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 12:24:48,665 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.074e+01 8.532e+01 9.331e+01 1.001e+02 1.251e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-26 12:24:52,533 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 508100 2023-11-26 12:24:55,663 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 3100, loss[loss=0.06103, simple_loss=0.07579, pruned_loss=0.01232, audio_tagging_loss=0.01082, over 15167.00 frames. ], tot_loss[loss=0.06657, simple_loss=0.09084, pruned_loss=0.01239, audio_tagging_loss=0.008762, over 3031779.88 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 12:25:08,712 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3387406.6666666665, ans=0.0 2023-11-26 12:25:10,111 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.42 vs. limit=15.0 2023-11-26 12:25:37,089 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3387540.0, ans=0.125 2023-11-26 12:25:39,179 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3387606.6666666665, ans=0.2 2023-11-26 12:25:43,439 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3387606.6666666665, ans=0.0 2023-11-26 12:25:47,452 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 508150 2023-11-26 12:25:50,633 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 3150, loss[loss=0.07789, simple_loss=0.1027, pruned_loss=0.01662, audio_tagging_loss=0.009933, over 15307.00 frames. ], tot_loss[loss=0.06679, simple_loss=0.09092, pruned_loss=0.01249, audio_tagging_loss=0.008844, over 3028334.89 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 12:26:06,409 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3387740.0, ans=0.0 2023-11-26 12:26:20,951 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3387806.6666666665, ans=0.125 2023-11-26 12:26:20,993 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3387806.6666666665, ans=0.1 2023-11-26 12:26:24,190 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3387873.3333333335, ans=0.125 2023-11-26 12:26:25,337 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.39 vs. limit=15.0 2023-11-26 12:26:31,468 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3387873.3333333335, ans=0.125 2023-11-26 12:26:39,865 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.518e+01 8.956e+01 9.437e+01 1.017e+02 1.314e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-26 12:26:40,085 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3387940.0, ans=0.0 2023-11-26 12:26:43,707 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 508200 2023-11-26 12:26:47,022 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 3200, loss[loss=0.07349, simple_loss=0.1022, pruned_loss=0.01424, audio_tagging_loss=0.008161, over 16253.00 frames. ], tot_loss[loss=0.0671, simple_loss=0.0912, pruned_loss=0.01254, audio_tagging_loss=0.008961, over 3027473.53 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:27:02,824 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3388073.3333333335, ans=0.125 2023-11-26 12:27:11,241 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3388140.0, ans=0.0 2023-11-26 12:27:39,639 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 12:27:40,474 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 508250 2023-11-26 12:27:44,166 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 3250, loss[loss=0.06491, simple_loss=0.08572, pruned_loss=0.01125, audio_tagging_loss=0.0108, over 15049.00 frames. ], tot_loss[loss=0.06691, simple_loss=0.0909, pruned_loss=0.01243, audio_tagging_loss=0.009022, over 3032263.34 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:27:44,439 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3388340.0, ans=0.05 2023-11-26 12:27:48,194 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.91 vs. limit=15.0 2023-11-26 12:27:58,373 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3388406.6666666665, ans=0.05 2023-11-26 12:28:12,782 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3388473.3333333335, ans=0.0 2023-11-26 12:28:27,413 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3388540.0, ans=0.2 2023-11-26 12:28:33,657 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.209e+01 8.911e+01 9.477e+01 1.008e+02 1.285e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-26 12:28:36,978 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 508300 2023-11-26 12:28:40,129 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 3300, loss[loss=0.06677, simple_loss=0.09344, pruned_loss=0.01158, audio_tagging_loss=0.008467, over 15202.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.09037, pruned_loss=0.0123, audio_tagging_loss=0.009041, over 3036427.22 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:28:40,358 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3388673.3333333335, ans=0.125 2023-11-26 12:28:47,760 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3388673.3333333335, ans=0.125 2023-11-26 12:28:50,478 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3388740.0, ans=0.2 2023-11-26 12:28:54,949 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.54 vs. limit=15.0 2023-11-26 12:28:55,717 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3388740.0, ans=0.125 2023-11-26 12:29:12,995 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3388873.3333333335, ans=0.09899494936611666 2023-11-26 12:29:21,464 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3388873.3333333335, ans=0.125 2023-11-26 12:29:27,992 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3388940.0, ans=0.125 2023-11-26 12:29:31,647 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3388940.0, ans=0.125 2023-11-26 12:29:32,638 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 508350 2023-11-26 12:29:35,766 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 3350, loss[loss=0.07364, simple_loss=0.1028, pruned_loss=0.01648, audio_tagging_loss=0.005742, over 14898.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.08962, pruned_loss=0.01233, audio_tagging_loss=0.008984, over 3040405.77 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:29:41,278 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3389006.6666666665, ans=0.1 2023-11-26 12:29:48,762 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.09 vs. limit=22.5 2023-11-26 12:30:25,724 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.370e+01 8.755e+01 9.551e+01 1.033e+02 1.237e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-26 12:30:29,033 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 508400 2023-11-26 12:30:33,015 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 3400, loss[loss=0.06641, simple_loss=0.09132, pruned_loss=0.01499, audio_tagging_loss=0.005758, over 15776.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.09049, pruned_loss=0.0124, audio_tagging_loss=0.008837, over 3052188.41 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 12:30:36,032 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3389340.0, ans=0.125 2023-11-26 12:30:44,522 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3389406.6666666665, ans=0.125 2023-11-26 12:31:12,231 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3389540.0, ans=0.125 2023-11-26 12:31:25,708 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 508450 2023-11-26 12:31:28,815 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 3450, loss[loss=0.05814, simple_loss=0.08079, pruned_loss=0.01002, audio_tagging_loss=0.007732, over 15853.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.09002, pruned_loss=0.01232, audio_tagging_loss=0.008806, over 3046847.68 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 12:31:49,899 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3389806.6666666665, ans=0.125 2023-11-26 12:32:01,509 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3389873.3333333335, ans=0.125 2023-11-26 12:32:03,471 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3389873.3333333335, ans=0.125 2023-11-26 12:32:09,854 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3389873.3333333335, ans=0.0 2023-11-26 12:32:18,082 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.080e+01 8.804e+01 9.534e+01 1.051e+02 1.288e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-26 12:32:19,408 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3389940.0, ans=0.0 2023-11-26 12:32:21,346 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 508500 2023-11-26 12:32:25,072 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 3500, loss[loss=0.06275, simple_loss=0.09361, pruned_loss=0.009785, audio_tagging_loss=0.006158, over 15361.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.08961, pruned_loss=0.01219, audio_tagging_loss=0.008791, over 3044550.05 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 12:32:27,464 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3390006.6666666665, ans=0.125 2023-11-26 12:32:36,623 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3390073.3333333335, ans=0.1 2023-11-26 12:32:49,250 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3390140.0, ans=0.125 2023-11-26 12:32:50,512 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3390140.0, ans=0.1 2023-11-26 12:32:55,575 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 12:32:58,873 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3390206.6666666665, ans=0.125 2023-11-26 12:33:04,171 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3390206.6666666665, ans=0.125 2023-11-26 12:33:10,079 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3390273.3333333335, ans=0.125 2023-11-26 12:33:17,938 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 508550 2023-11-26 12:33:21,104 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 3550, loss[loss=0.06503, simple_loss=0.08633, pruned_loss=0.01142, audio_tagging_loss=0.01045, over 14031.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.08898, pruned_loss=0.01225, audio_tagging_loss=0.008723, over 3045186.42 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 12:33:40,592 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3390406.6666666665, ans=0.1 2023-11-26 12:33:52,128 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.77 vs. limit=15.0 2023-11-26 12:33:57,316 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3390540.0, ans=0.1 2023-11-26 12:34:10,931 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.836e+01 8.734e+01 9.266e+01 9.991e+01 1.201e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-26 12:34:14,233 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 508600 2023-11-26 12:34:17,420 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3390673.3333333335, ans=0.125 2023-11-26 12:34:18,197 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 3600, loss[loss=0.06288, simple_loss=0.08856, pruned_loss=0.01048, audio_tagging_loss=0.008122, over 13959.00 frames. ], tot_loss[loss=0.06528, simple_loss=0.08863, pruned_loss=0.01225, audio_tagging_loss=0.008716, over 3047083.29 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 12:34:19,916 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.51 vs. limit=10.0 2023-11-26 12:34:27,023 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 12:34:36,981 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3390740.0, ans=0.2 2023-11-26 12:34:39,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3390806.6666666665, ans=0.0 2023-11-26 12:35:10,367 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 508650 2023-11-26 12:35:13,508 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 3650, loss[loss=0.06143, simple_loss=0.08561, pruned_loss=0.01001, audio_tagging_loss=0.008612, over 16245.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.08968, pruned_loss=0.0125, audio_tagging_loss=0.008589, over 3048992.87 frames. ], batch size: 60, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 12:35:26,689 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3391073.3333333335, ans=0.125 2023-11-26 12:35:27,096 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.29 vs. limit=15.0 2023-11-26 12:35:30,871 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3391073.3333333335, ans=0.0 2023-11-26 12:35:31,960 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3391073.3333333335, ans=0.125 2023-11-26 12:35:33,690 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=3391073.3333333335, ans=15.0 2023-11-26 12:35:58,337 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.80 vs. limit=15.0 2023-11-26 12:36:02,418 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3391273.3333333335, ans=0.125 2023-11-26 12:36:03,218 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.823e+01 8.759e+01 9.340e+01 1.006e+02 1.350e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-26 12:36:03,780 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.94 vs. limit=15.0 2023-11-26 12:36:06,534 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 508700 2023-11-26 12:36:10,111 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 3700, loss[loss=0.06896, simple_loss=0.09534, pruned_loss=0.01259, audio_tagging_loss=0.008703, over 14718.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08938, pruned_loss=0.01236, audio_tagging_loss=0.00863, over 3052138.53 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 12:36:13,907 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.60 vs. limit=22.5 2023-11-26 12:36:23,630 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3391406.6666666665, ans=0.125 2023-11-26 12:36:29,405 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3391406.6666666665, ans=0.1 2023-11-26 12:36:51,800 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3391540.0, ans=0.0 2023-11-26 12:37:02,947 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 508750 2023-11-26 12:37:06,113 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 3750, loss[loss=0.05325, simple_loss=0.0738, pruned_loss=0.006366, audio_tagging_loss=0.009984, over 15852.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.09034, pruned_loss=0.01257, audio_tagging_loss=0.008655, over 3061897.47 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 12:37:20,930 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.02 vs. limit=15.0 2023-11-26 12:37:29,816 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3391806.6666666665, ans=0.125 2023-11-26 12:37:32,988 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3391806.6666666665, ans=0.2 2023-11-26 12:37:36,608 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3391806.6666666665, ans=0.0 2023-11-26 12:37:39,241 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.37 vs. limit=15.0 2023-11-26 12:37:40,517 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.55 vs. limit=15.0 2023-11-26 12:37:45,537 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3391873.3333333335, ans=0.1 2023-11-26 12:37:45,585 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3391873.3333333335, ans=0.125 2023-11-26 12:37:46,460 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 12:37:50,886 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3391940.0, ans=0.2 2023-11-26 12:37:52,914 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3391940.0, ans=0.125 2023-11-26 12:37:54,806 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.186e+01 8.936e+01 9.506e+01 1.051e+02 1.254e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-26 12:37:58,568 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 508800 2023-11-26 12:38:01,928 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 3800, loss[loss=0.05671, simple_loss=0.08056, pruned_loss=0.008592, audio_tagging_loss=0.007837, over 15562.00 frames. ], tot_loss[loss=0.06745, simple_loss=0.09195, pruned_loss=0.01279, audio_tagging_loss=0.008683, over 3058582.21 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 12:38:05,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3392006.6666666665, ans=0.125 2023-11-26 12:38:32,773 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.65 vs. limit=10.0 2023-11-26 12:38:37,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=3392206.6666666665, ans=22.5 2023-11-26 12:38:53,897 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3392273.3333333335, ans=0.1 2023-11-26 12:38:54,730 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 508850 2023-11-26 12:38:57,866 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 3850, loss[loss=0.05601, simple_loss=0.07525, pruned_loss=0.009226, audio_tagging_loss=0.009156, over 14395.00 frames. ], tot_loss[loss=0.06777, simple_loss=0.09235, pruned_loss=0.01287, audio_tagging_loss=0.008725, over 3054274.43 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 12:39:08,889 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.47 vs. limit=15.0 2023-11-26 12:39:19,272 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3392473.3333333335, ans=0.0 2023-11-26 12:39:35,219 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3392540.0, ans=0.125 2023-11-26 12:39:49,121 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.401e+01 8.737e+01 9.345e+01 1.016e+02 1.351e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-26 12:39:51,329 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 508900 2023-11-26 12:39:54,435 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 3900, loss[loss=0.06111, simple_loss=0.08299, pruned_loss=0.01026, audio_tagging_loss=0.009349, over 15296.00 frames. ], tot_loss[loss=0.06744, simple_loss=0.09194, pruned_loss=0.01276, audio_tagging_loss=0.008712, over 3052030.35 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 12:39:57,785 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3392673.3333333335, ans=0.1 2023-11-26 12:40:09,468 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.51 vs. limit=15.0 2023-11-26 12:40:15,278 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.35 vs. limit=6.0 2023-11-26 12:40:28,443 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3392873.3333333335, ans=0.0 2023-11-26 12:40:34,279 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3392873.3333333335, ans=0.125 2023-11-26 12:40:37,547 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3392873.3333333335, ans=0.2 2023-11-26 12:40:46,832 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 508950 2023-11-26 12:40:48,145 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 12:40:50,005 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 3950, loss[loss=0.05807, simple_loss=0.07926, pruned_loss=0.008635, audio_tagging_loss=0.009808, over 15874.00 frames. ], tot_loss[loss=0.06713, simple_loss=0.09124, pruned_loss=0.01267, audio_tagging_loss=0.008839, over 3051676.88 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 12:41:00,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3393073.3333333335, ans=0.07 2023-11-26 12:41:09,230 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=3393073.3333333335, ans=15.0 2023-11-26 12:41:21,104 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3393140.0, ans=0.125 2023-11-26 12:41:37,440 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3393273.3333333335, ans=0.1 2023-11-26 12:41:38,544 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3393273.3333333335, ans=0.2 2023-11-26 12:41:40,954 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.644e+01 8.928e+01 9.625e+01 1.027e+02 1.240e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-26 12:41:42,323 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3393273.3333333335, ans=0.2 2023-11-26 12:41:43,136 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 509000 2023-11-26 12:41:46,544 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 4000, loss[loss=0.07695, simple_loss=0.103, pruned_loss=0.01703, audio_tagging_loss=0.00843, over 15514.00 frames. ], tot_loss[loss=0.067, simple_loss=0.09087, pruned_loss=0.01265, audio_tagging_loss=0.008914, over 3046545.65 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 12:41:50,876 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.94 vs. limit=22.5 2023-11-26 12:41:53,743 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3393340.0, ans=0.05 2023-11-26 12:41:57,478 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3393406.6666666665, ans=0.1 2023-11-26 12:42:01,644 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3393406.6666666665, ans=0.0 2023-11-26 12:42:10,131 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3393473.3333333335, ans=0.2 2023-11-26 12:42:14,370 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3393473.3333333335, ans=0.0 2023-11-26 12:42:27,525 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.18 vs. limit=15.0 2023-11-26 12:42:39,945 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 509050 2023-11-26 12:42:41,640 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.42 vs. limit=15.0 2023-11-26 12:42:43,162 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 4050, loss[loss=0.06441, simple_loss=0.08817, pruned_loss=0.01073, audio_tagging_loss=0.009595, over 14496.00 frames. ], tot_loss[loss=0.06747, simple_loss=0.09171, pruned_loss=0.01276, audio_tagging_loss=0.008857, over 3044231.98 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 12:42:46,557 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 12:42:47,337 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 12:42:53,976 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3393740.0, ans=0.1 2023-11-26 12:42:57,236 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3393740.0, ans=0.0 2023-11-26 12:43:03,789 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3393806.6666666665, ans=0.0 2023-11-26 12:43:27,331 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3393940.0, ans=0.2 2023-11-26 12:43:33,515 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.514e+01 8.904e+01 9.389e+01 9.930e+01 1.705e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-26 12:43:35,713 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 509100 2023-11-26 12:43:37,971 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3394006.6666666665, ans=0.0 2023-11-26 12:43:38,791 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 4100, loss[loss=0.0758, simple_loss=0.1114, pruned_loss=0.01344, audio_tagging_loss=0.006681, over 15482.00 frames. ], tot_loss[loss=0.06781, simple_loss=0.092, pruned_loss=0.01285, audio_tagging_loss=0.008955, over 3040752.44 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 12:44:04,576 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.02 vs. limit=22.5 2023-11-26 12:44:30,942 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 509150 2023-11-26 12:44:31,014 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3394273.3333333335, ans=0.2 2023-11-26 12:44:34,590 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 4150, loss[loss=0.07549, simple_loss=0.1039, pruned_loss=0.01559, audio_tagging_loss=0.007971, over 16161.00 frames. ], tot_loss[loss=0.06711, simple_loss=0.09114, pruned_loss=0.01266, audio_tagging_loss=0.00888, over 3041019.84 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 12:44:39,016 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3394340.0, ans=0.2 2023-11-26 12:44:40,734 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3394340.0, ans=0.1 2023-11-26 12:44:41,780 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3394340.0, ans=0.125 2023-11-26 12:44:53,126 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3394406.6666666665, ans=0.125 2023-11-26 12:45:07,050 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3394540.0, ans=0.125 2023-11-26 12:45:17,401 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 12:45:25,394 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.166e+01 8.865e+01 9.444e+01 1.016e+02 1.308e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-26 12:45:27,619 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 509200 2023-11-26 12:45:31,516 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 4200, loss[loss=0.05122, simple_loss=0.06672, pruned_loss=0.007728, audio_tagging_loss=0.01013, over 15506.00 frames. ], tot_loss[loss=0.06691, simple_loss=0.09107, pruned_loss=0.0126, audio_tagging_loss=0.008777, over 3048944.40 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 12:45:33,879 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3394673.3333333335, ans=0.125 2023-11-26 12:45:48,412 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.53 vs. limit=15.0 2023-11-26 12:45:49,118 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3394740.0, ans=0.07 2023-11-26 12:45:54,513 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3394806.6666666665, ans=0.2 2023-11-26 12:45:55,448 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 12:45:57,094 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3394806.6666666665, ans=0.0 2023-11-26 12:46:24,005 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 509250 2023-11-26 12:46:25,637 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.22 vs. limit=15.0 2023-11-26 12:46:27,122 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 4250, loss[loss=0.07195, simple_loss=0.1069, pruned_loss=0.01221, audio_tagging_loss=0.006278, over 16352.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.09095, pruned_loss=0.01249, audio_tagging_loss=0.008688, over 3055648.37 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 12:46:30,563 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3395006.6666666665, ans=0.125 2023-11-26 12:46:34,982 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.81 vs. limit=15.0 2023-11-26 12:46:35,779 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3395006.6666666665, ans=0.125 2023-11-26 12:47:02,270 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3395206.6666666665, ans=0.05 2023-11-26 12:47:09,066 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.80 vs. limit=15.0 2023-11-26 12:47:09,230 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.54 vs. limit=6.0 2023-11-26 12:47:12,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3395273.3333333335, ans=0.125 2023-11-26 12:47:17,206 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3395273.3333333335, ans=0.025 2023-11-26 12:47:18,028 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.812e+01 8.820e+01 9.502e+01 1.020e+02 1.301e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-26 12:47:19,178 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 509300 2023-11-26 12:47:22,845 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 4300, loss[loss=0.06299, simple_loss=0.08349, pruned_loss=0.008421, audio_tagging_loss=0.01283, over 15621.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.09058, pruned_loss=0.01231, audio_tagging_loss=0.008679, over 3050984.20 frames. ], batch size: 60, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 12:47:23,117 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3395340.0, ans=0.0 2023-11-26 12:47:25,716 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.57 vs. limit=10.0 2023-11-26 12:47:32,663 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3395340.0, ans=0.0 2023-11-26 12:47:50,565 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3395473.3333333335, ans=0.0 2023-11-26 12:48:01,134 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3395540.0, ans=0.1 2023-11-26 12:48:11,486 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3395606.6666666665, ans=0.125 2023-11-26 12:48:16,094 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 509350 2023-11-26 12:48:16,213 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3395606.6666666665, ans=0.0 2023-11-26 12:48:18,302 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3395673.3333333335, ans=0.2 2023-11-26 12:48:19,154 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 4350, loss[loss=0.08575, simple_loss=0.1154, pruned_loss=0.01837, audio_tagging_loss=0.009656, over 15285.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.09095, pruned_loss=0.01235, audio_tagging_loss=0.008671, over 3042141.18 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 12:48:42,771 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.38 vs. limit=15.0 2023-11-26 12:48:44,268 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3395806.6666666665, ans=10.0 2023-11-26 12:49:07,860 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 12:49:10,771 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.277e+01 9.017e+01 9.685e+01 1.042e+02 1.339e+02, threshold=1.937e+02, percent-clipped=0.0 2023-11-26 12:49:11,949 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 509400 2023-11-26 12:49:15,348 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 4400, loss[loss=0.05721, simple_loss=0.08091, pruned_loss=0.01065, audio_tagging_loss=0.006098, over 15714.00 frames. ], tot_loss[loss=0.06691, simple_loss=0.09144, pruned_loss=0.01255, audio_tagging_loss=0.008636, over 3047718.81 frames. ], batch size: 60, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 12:50:04,415 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3396273.3333333335, ans=0.125 2023-11-26 12:50:07,542 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 509450 2023-11-26 12:50:10,718 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 4450, loss[loss=0.05682, simple_loss=0.07769, pruned_loss=0.008586, audio_tagging_loss=0.009389, over 14843.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.09109, pruned_loss=0.01254, audio_tagging_loss=0.008635, over 3047733.75 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 12:50:13,647 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3396340.0, ans=0.125 2023-11-26 12:50:17,916 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3396340.0, ans=0.125 2023-11-26 12:50:22,579 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3396406.6666666665, ans=0.1 2023-11-26 12:50:23,096 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.87 vs. limit=15.0 2023-11-26 12:50:44,969 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3396540.0, ans=0.125 2023-11-26 12:50:46,106 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3396540.0, ans=0.0 2023-11-26 12:50:56,724 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.11 vs. limit=15.0 2023-11-26 12:51:00,434 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3396606.6666666665, ans=0.125 2023-11-26 12:51:00,680 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3396606.6666666665, ans=0.125 2023-11-26 12:51:03,423 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.338e+01 8.973e+01 9.425e+01 1.012e+02 1.226e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-26 12:51:03,518 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 509500 2023-11-26 12:51:07,258 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 4500, loss[loss=0.05386, simple_loss=0.0775, pruned_loss=0.008147, audio_tagging_loss=0.006961, over 14726.00 frames. ], tot_loss[loss=0.06713, simple_loss=0.09202, pruned_loss=0.0126, audio_tagging_loss=0.00852, over 3048431.75 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 12:51:08,585 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3396673.3333333335, ans=0.2 2023-11-26 12:51:20,910 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3396740.0, ans=0.0 2023-11-26 12:51:27,785 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.16 vs. limit=6.0 2023-11-26 12:51:32,541 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3396806.6666666665, ans=0.125 2023-11-26 12:51:35,950 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.19 vs. limit=15.0 2023-11-26 12:51:50,832 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.21 vs. limit=15.0 2023-11-26 12:51:52,585 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3396940.0, ans=0.1 2023-11-26 12:51:59,858 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 509550 2023-11-26 12:52:01,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3396940.0, ans=0.125 2023-11-26 12:52:03,038 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 4550, loss[loss=0.07981, simple_loss=0.1084, pruned_loss=0.01731, audio_tagging_loss=0.008301, over 15619.00 frames. ], tot_loss[loss=0.06697, simple_loss=0.09163, pruned_loss=0.01263, audio_tagging_loss=0.008517, over 3047216.16 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 12:52:14,011 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3397073.3333333335, ans=0.125 2023-11-26 12:52:31,025 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff3.min_abs, batch_count=3397140.0, ans=0.2 2023-11-26 12:52:35,302 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3397206.6666666665, ans=0.1 2023-11-26 12:52:35,508 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.51 vs. limit=15.0 2023-11-26 12:52:47,781 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 12:52:55,274 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 509600 2023-11-26 12:52:56,220 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.419e+01 8.693e+01 9.289e+01 1.006e+02 1.287e+02, threshold=1.858e+02, percent-clipped=0.0 2023-11-26 12:52:57,785 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3397340.0, ans=0.125 2023-11-26 12:52:58,668 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 4600, loss[loss=0.05854, simple_loss=0.07438, pruned_loss=0.01228, audio_tagging_loss=0.009066, over 14533.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.09099, pruned_loss=0.0124, audio_tagging_loss=0.008622, over 3051950.65 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 8.0 2023-11-26 12:53:27,446 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3397473.3333333335, ans=0.2 2023-11-26 12:53:36,265 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.88 vs. limit=12.0 2023-11-26 12:53:43,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3397606.6666666665, ans=0.07 2023-11-26 12:53:51,231 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 509650 2023-11-26 12:53:54,953 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 4650, loss[loss=0.08354, simple_loss=0.117, pruned_loss=0.01469, audio_tagging_loss=0.01036, over 15408.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.09056, pruned_loss=0.01237, audio_tagging_loss=0.008696, over 3049886.38 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 8.0 2023-11-26 12:54:00,649 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3397673.3333333335, ans=0.125 2023-11-26 12:54:24,297 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.28 vs. limit=12.0 2023-11-26 12:54:47,241 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.43 vs. limit=12.0 2023-11-26 12:54:48,064 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 509700 2023-11-26 12:54:48,440 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.37 vs. limit=22.5 2023-11-26 12:54:49,071 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.259e+01 8.687e+01 9.484e+01 1.034e+02 1.399e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-26 12:54:51,720 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 4700, loss[loss=0.06473, simple_loss=0.08144, pruned_loss=0.01626, audio_tagging_loss=0.007755, over 14546.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.08964, pruned_loss=0.01236, audio_tagging_loss=0.008875, over 3047544.74 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 8.0 2023-11-26 12:55:15,943 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.05 vs. limit=15.0 2023-11-26 12:55:44,088 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 509750 2023-11-26 12:55:47,209 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 4750, loss[loss=0.05741, simple_loss=0.0735, pruned_loss=0.01126, audio_tagging_loss=0.009409, over 14710.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.0899, pruned_loss=0.0126, audio_tagging_loss=0.00892, over 3043778.20 frames. ], batch size: 53, lr: 1.57e-03, grad_scale: 8.0 2023-11-26 12:55:58,796 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3398406.6666666665, ans=0.125 2023-11-26 12:56:33,249 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3398606.6666666665, ans=0.125 2023-11-26 12:56:37,900 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3398606.6666666665, ans=0.125 2023-11-26 12:56:39,956 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 509800 2023-11-26 12:56:40,890 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.582e+01 8.623e+01 9.231e+01 9.879e+01 9.064e+02, threshold=1.846e+02, percent-clipped=2.0 2023-11-26 12:56:43,264 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 4800, loss[loss=0.06294, simple_loss=0.08554, pruned_loss=0.01036, audio_tagging_loss=0.009811, over 14885.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.08984, pruned_loss=0.01254, audio_tagging_loss=0.009081, over 3043286.07 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 12:56:54,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3398740.0, ans=0.0 2023-11-26 12:57:02,486 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3398740.0, ans=0.0 2023-11-26 12:57:16,967 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3398873.3333333335, ans=0.125 2023-11-26 12:57:35,396 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3398940.0, ans=0.125 2023-11-26 12:57:36,376 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 509850 2023-11-26 12:57:39,520 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 4850, loss[loss=0.05704, simple_loss=0.06728, pruned_loss=0.01184, audio_tagging_loss=0.01157, over 15875.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.08976, pruned_loss=0.01254, audio_tagging_loss=0.009221, over 3044370.84 frames. ], batch size: 64, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 12:57:48,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3399006.6666666665, ans=0.2 2023-11-26 12:58:04,112 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3399140.0, ans=0.1 2023-11-26 12:58:26,757 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.43 vs. limit=15.0 2023-11-26 12:58:29,533 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3399273.3333333335, ans=0.125 2023-11-26 12:58:29,566 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3399273.3333333335, ans=0.1 2023-11-26 12:58:32,124 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 509900 2023-11-26 12:58:33,028 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.451e+01 8.785e+01 9.504e+01 1.038e+02 1.484e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-26 12:58:35,206 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 4900, loss[loss=0.0502, simple_loss=0.06108, pruned_loss=0.01277, audio_tagging_loss=0.006886, over 13959.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.08948, pruned_loss=0.0124, audio_tagging_loss=0.009133, over 3036728.50 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 12:58:43,380 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.68 vs. limit=15.0 2023-11-26 12:59:06,766 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3399473.3333333335, ans=0.125 2023-11-26 12:59:15,786 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3399540.0, ans=10.0 2023-11-26 12:59:27,651 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 509950 2023-11-26 12:59:30,723 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 4950, loss[loss=0.05217, simple_loss=0.0684, pruned_loss=0.008273, audio_tagging_loss=0.009699, over 15489.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.08915, pruned_loss=0.01218, audio_tagging_loss=0.008986, over 3044230.86 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 12:59:50,810 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3399740.0, ans=0.0 2023-11-26 13:00:00,994 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3399806.6666666665, ans=0.0 2023-11-26 13:00:02,444 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.14 vs. limit=12.0 2023-11-26 13:00:22,376 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 13:00:23,371 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 510000 2023-11-26 13:00:24,319 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.267e+01 8.558e+01 9.135e+01 1.006e+02 1.501e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-26 13:00:27,001 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 5000, loss[loss=0.07325, simple_loss=0.09811, pruned_loss=0.01496, audio_tagging_loss=0.009237, over 14183.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.08936, pruned_loss=0.01209, audio_tagging_loss=0.008799, over 3038922.26 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:00:28,657 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.97 vs. limit=10.0 2023-11-26 13:00:30,648 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.91 vs. limit=10.0 2023-11-26 13:00:34,588 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3400006.6666666665, ans=0.2 2023-11-26 13:00:48,713 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3400140.0, ans=0.0 2023-11-26 13:01:00,461 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3400206.6666666665, ans=0.1 2023-11-26 13:01:03,026 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=3400206.6666666665, ans=6.0 2023-11-26 13:01:05,688 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3400206.6666666665, ans=0.0 2023-11-26 13:01:07,315 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3400206.6666666665, ans=0.0 2023-11-26 13:01:12,786 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.03 vs. limit=22.5 2023-11-26 13:01:18,669 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 510050 2023-11-26 13:01:21,752 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 5050, loss[loss=0.05539, simple_loss=0.07813, pruned_loss=0.008877, audio_tagging_loss=0.007449, over 15384.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08922, pruned_loss=0.01203, audio_tagging_loss=0.008659, over 3041458.45 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:01:37,807 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3400406.6666666665, ans=0.125 2023-11-26 13:01:59,263 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.94 vs. limit=12.0 2023-11-26 13:02:00,933 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3400540.0, ans=0.2 2023-11-26 13:02:04,314 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3400540.0, ans=0.04949747468305833 2023-11-26 13:02:14,023 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 510100 2023-11-26 13:02:14,970 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.216e+01 8.485e+01 9.179e+01 9.722e+01 1.214e+02, threshold=1.836e+02, percent-clipped=0.0 2023-11-26 13:02:15,334 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3400606.6666666665, ans=0.125 2023-11-26 13:02:17,238 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.83 vs. limit=15.0 2023-11-26 13:02:17,666 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 5100, loss[loss=0.06095, simple_loss=0.08299, pruned_loss=0.01004, audio_tagging_loss=0.009418, over 15309.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.08912, pruned_loss=0.01196, audio_tagging_loss=0.008689, over 3043996.81 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:02:37,319 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.47 vs. limit=15.0 2023-11-26 13:02:42,340 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3400806.6666666665, ans=0.0 2023-11-26 13:02:57,810 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3400873.3333333335, ans=0.125 2023-11-26 13:03:10,330 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 510150 2023-11-26 13:03:13,912 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 5150, loss[loss=0.06833, simple_loss=0.09013, pruned_loss=0.01434, audio_tagging_loss=0.008923, over 15355.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08961, pruned_loss=0.01199, audio_tagging_loss=0.00868, over 3046246.74 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:03:18,305 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3401006.6666666665, ans=0.0 2023-11-26 13:03:20,531 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3401006.6666666665, ans=0.125 2023-11-26 13:03:26,815 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3401073.3333333335, ans=0.125 2023-11-26 13:03:51,926 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3401206.6666666665, ans=0.125 2023-11-26 13:03:57,740 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3401273.3333333335, ans=0.125 2023-11-26 13:04:06,041 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 510200 2023-11-26 13:04:06,967 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.574e+01 8.866e+01 9.575e+01 1.024e+02 1.389e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-26 13:04:07,513 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3401273.3333333335, ans=0.125 2023-11-26 13:04:09,408 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 5200, loss[loss=0.07907, simple_loss=0.1104, pruned_loss=0.01647, audio_tagging_loss=0.007401, over 15354.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.08989, pruned_loss=0.01213, audio_tagging_loss=0.008727, over 3049908.69 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:04:09,818 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.88 vs. limit=15.0 2023-11-26 13:04:17,117 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3401340.0, ans=0.1 2023-11-26 13:04:18,325 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 13:04:28,237 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3401406.6666666665, ans=0.125 2023-11-26 13:05:01,178 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 510250 2023-11-26 13:05:04,286 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 5250, loss[loss=0.06189, simple_loss=0.08477, pruned_loss=0.009279, audio_tagging_loss=0.01023, over 15665.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08984, pruned_loss=0.01206, audio_tagging_loss=0.008688, over 3055745.50 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:05:08,748 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3401673.3333333335, ans=0.125 2023-11-26 13:05:51,367 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3401940.0, ans=0.125 2023-11-26 13:05:58,239 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 510300 2023-11-26 13:06:00,493 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3402006.6666666665, ans=0.04949747468305833 2023-11-26 13:06:01,312 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.575e+01 8.828e+01 9.543e+01 1.025e+02 1.295e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-26 13:06:01,368 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 5300, loss[loss=0.05945, simple_loss=0.07109, pruned_loss=0.009688, audio_tagging_loss=0.01421, over 16542.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.09043, pruned_loss=0.01223, audio_tagging_loss=0.008662, over 3060789.93 frames. ], batch size: 62, lr: 1.57e-03, grad_scale: 8.0 2023-11-26 13:06:13,851 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3402073.3333333335, ans=0.0 2023-11-26 13:06:32,408 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3402140.0, ans=0.125 2023-11-26 13:06:38,054 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3402206.6666666665, ans=0.125 2023-11-26 13:06:53,217 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3402273.3333333335, ans=0.125 2023-11-26 13:06:54,128 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 510350 2023-11-26 13:06:57,228 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 5350, loss[loss=0.04919, simple_loss=0.06047, pruned_loss=0.008672, audio_tagging_loss=0.01028, over 15372.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.08981, pruned_loss=0.01218, audio_tagging_loss=0.008714, over 3056551.22 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 8.0 2023-11-26 13:07:43,075 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3402606.6666666665, ans=0.2 2023-11-26 13:07:49,197 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 510400 2023-11-26 13:07:52,616 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.116e+01 8.825e+01 9.466e+01 1.015e+02 1.457e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-26 13:07:52,643 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 5400, loss[loss=0.06546, simple_loss=0.08598, pruned_loss=0.01552, audio_tagging_loss=0.006947, over 15505.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.08936, pruned_loss=0.01213, audio_tagging_loss=0.008698, over 3051976.24 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 8.0 2023-11-26 13:07:54,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3402673.3333333335, ans=0.0 2023-11-26 13:07:57,592 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3402673.3333333335, ans=0.2 2023-11-26 13:08:45,450 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 510450 2023-11-26 13:08:49,164 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 5450, loss[loss=0.07734, simple_loss=0.09688, pruned_loss=0.01993, audio_tagging_loss=0.00896, over 14762.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.0902, pruned_loss=0.01231, audio_tagging_loss=0.008707, over 3051258.53 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 8.0 2023-11-26 13:09:12,145 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.90 vs. limit=12.0 2023-11-26 13:09:19,301 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3403140.0, ans=0.025 2023-11-26 13:09:26,571 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.49 vs. limit=15.0 2023-11-26 13:09:34,180 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.48 vs. limit=6.0 2023-11-26 13:09:38,040 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3403273.3333333335, ans=0.1 2023-11-26 13:09:41,531 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 510500 2023-11-26 13:09:44,641 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.143e+01 8.724e+01 9.189e+01 1.004e+02 1.414e+02, threshold=1.838e+02, percent-clipped=0.0 2023-11-26 13:09:44,670 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 5500, loss[loss=0.0664, simple_loss=0.08937, pruned_loss=0.01426, audio_tagging_loss=0.007455, over 15641.00 frames. ], tot_loss[loss=0.06676, simple_loss=0.09114, pruned_loss=0.01247, audio_tagging_loss=0.008726, over 3052058.06 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 8.0 2023-11-26 13:09:44,951 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3403340.0, ans=0.125 2023-11-26 13:09:58,936 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3403406.6666666665, ans=0.125 2023-11-26 13:10:01,135 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3403406.6666666665, ans=0.0 2023-11-26 13:10:13,973 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3403473.3333333335, ans=0.125 2023-11-26 13:10:15,100 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3403473.3333333335, ans=0.125 2023-11-26 13:10:37,159 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 510550 2023-11-26 13:10:40,232 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 5550, loss[loss=0.06222, simple_loss=0.08464, pruned_loss=0.01065, audio_tagging_loss=0.009251, over 14730.00 frames. ], tot_loss[loss=0.06684, simple_loss=0.09108, pruned_loss=0.01253, audio_tagging_loss=0.008763, over 3053343.90 frames. ], batch size: 53, lr: 1.57e-03, grad_scale: 8.0 2023-11-26 13:10:41,606 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3403673.3333333335, ans=0.125 2023-11-26 13:10:49,540 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3403673.3333333335, ans=0.125 2023-11-26 13:10:51,592 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 13:11:11,515 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3403806.6666666665, ans=0.1 2023-11-26 13:11:17,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3403873.3333333335, ans=0.1 2023-11-26 13:11:17,875 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3403873.3333333335, ans=0.0 2023-11-26 13:11:27,525 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3403940.0, ans=0.2 2023-11-26 13:11:28,896 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3403940.0, ans=0.0 2023-11-26 13:11:28,981 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3403940.0, ans=0.0 2023-11-26 13:11:30,042 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3403940.0, ans=0.125 2023-11-26 13:11:32,159 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3403940.0, ans=0.0 2023-11-26 13:11:33,002 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 510600 2023-11-26 13:11:36,393 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.069e+01 8.941e+01 9.582e+01 1.033e+02 2.288e+02, threshold=1.916e+02, percent-clipped=1.0 2023-11-26 13:11:36,420 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 5600, loss[loss=0.05181, simple_loss=0.05916, pruned_loss=0.008314, audio_tagging_loss=0.01392, over 14876.00 frames. ], tot_loss[loss=0.06664, simple_loss=0.09047, pruned_loss=0.01243, audio_tagging_loss=0.00897, over 3050744.73 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:11:41,779 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.09 vs. limit=6.0 2023-11-26 13:11:55,415 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 13:12:02,950 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3404140.0, ans=0.0 2023-11-26 13:12:18,024 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 13:12:27,714 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3404273.3333333335, ans=0.2 2023-11-26 13:12:30,154 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 510650 2023-11-26 13:12:31,265 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3404273.3333333335, ans=0.125 2023-11-26 13:12:31,343 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3404273.3333333335, ans=0.125 2023-11-26 13:12:33,244 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 5650, loss[loss=0.05827, simple_loss=0.06832, pruned_loss=0.01163, audio_tagging_loss=0.01249, over 14920.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.09031, pruned_loss=0.01247, audio_tagging_loss=0.008986, over 3055029.74 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:12:44,202 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3404406.6666666665, ans=0.125 2023-11-26 13:12:57,475 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3404473.3333333335, ans=0.125 2023-11-26 13:12:59,569 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3404473.3333333335, ans=0.1 2023-11-26 13:13:00,676 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3404473.3333333335, ans=0.07 2023-11-26 13:13:01,674 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3404473.3333333335, ans=0.035 2023-11-26 13:13:09,597 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3404540.0, ans=0.0 2023-11-26 13:13:11,216 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3404540.0, ans=0.125 2023-11-26 13:13:19,210 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3404606.6666666665, ans=0.0 2023-11-26 13:13:25,405 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 510700 2023-11-26 13:13:25,610 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3404606.6666666665, ans=0.09899494936611666 2023-11-26 13:13:26,639 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3404606.6666666665, ans=0.025 2023-11-26 13:13:28,508 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.526e+01 8.672e+01 9.212e+01 9.928e+01 1.414e+02, threshold=1.842e+02, percent-clipped=0.0 2023-11-26 13:13:28,535 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 5700, loss[loss=0.05925, simple_loss=0.08088, pruned_loss=0.01112, audio_tagging_loss=0.007689, over 13767.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.08917, pruned_loss=0.01236, audio_tagging_loss=0.008959, over 3053466.70 frames. ], batch size: 52, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:13:28,738 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3404673.3333333335, ans=0.2 2023-11-26 13:13:30,983 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3404673.3333333335, ans=0.95 2023-11-26 13:13:49,121 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3404740.0, ans=0.125 2023-11-26 13:13:57,657 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3404806.6666666665, ans=0.0 2023-11-26 13:14:03,031 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3404873.3333333335, ans=0.0 2023-11-26 13:14:03,071 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3404873.3333333335, ans=0.2 2023-11-26 13:14:17,180 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=3404940.0, ans=6.0 2023-11-26 13:14:21,359 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 510750 2023-11-26 13:14:24,488 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 5750, loss[loss=0.05065, simple_loss=0.0634, pruned_loss=0.007561, audio_tagging_loss=0.01139, over 15132.00 frames. ], tot_loss[loss=0.0663, simple_loss=0.08976, pruned_loss=0.01249, audio_tagging_loss=0.008924, over 3055094.13 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:14:45,736 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.07 vs. limit=10.0 2023-11-26 13:14:48,722 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3405140.0, ans=0.2 2023-11-26 13:14:50,837 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3405140.0, ans=0.125 2023-11-26 13:14:57,199 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3405206.6666666665, ans=0.125 2023-11-26 13:14:59,941 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.29 vs. limit=15.0 2023-11-26 13:15:17,194 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 510800 2023-11-26 13:15:20,820 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.344e+01 8.469e+01 9.302e+01 1.019e+02 1.569e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-26 13:15:20,848 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 5800, loss[loss=0.07182, simple_loss=0.09777, pruned_loss=0.01325, audio_tagging_loss=0.00968, over 15491.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.08949, pruned_loss=0.01232, audio_tagging_loss=0.008882, over 3056046.01 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:15:33,227 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3405406.6666666665, ans=0.125 2023-11-26 13:15:41,734 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3405473.3333333335, ans=0.125 2023-11-26 13:15:50,678 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3405473.3333333335, ans=0.0 2023-11-26 13:15:54,447 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3405540.0, ans=0.1 2023-11-26 13:16:02,415 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3405540.0, ans=0.0 2023-11-26 13:16:13,340 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 510850 2023-11-26 13:16:16,493 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 5850, loss[loss=0.08605, simple_loss=0.1256, pruned_loss=0.01585, audio_tagging_loss=0.00741, over 15700.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08892, pruned_loss=0.01221, audio_tagging_loss=0.008829, over 3058306.76 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:16:20,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3405673.3333333335, ans=0.2 2023-11-26 13:16:43,049 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3405806.6666666665, ans=0.2 2023-11-26 13:16:48,782 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3405873.3333333335, ans=0.0 2023-11-26 13:17:04,121 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3405940.0, ans=0.0 2023-11-26 13:17:08,206 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 510900 2023-11-26 13:17:09,923 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3405940.0, ans=0.125 2023-11-26 13:17:11,771 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.196e+01 8.774e+01 9.383e+01 1.009e+02 2.236e+02, threshold=1.877e+02, percent-clipped=1.0 2023-11-26 13:17:11,809 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 5900, loss[loss=0.06762, simple_loss=0.0889, pruned_loss=0.01349, audio_tagging_loss=0.009685, over 14796.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.08941, pruned_loss=0.01224, audio_tagging_loss=0.008715, over 3055660.09 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:17:28,911 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3406073.3333333335, ans=0.2 2023-11-26 13:17:38,021 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3406140.0, ans=0.2 2023-11-26 13:17:46,902 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.09 vs. limit=15.0 2023-11-26 13:17:51,137 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3406206.6666666665, ans=0.1 2023-11-26 13:17:55,565 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.86 vs. limit=15.0 2023-11-26 13:18:00,530 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.61 vs. limit=15.0 2023-11-26 13:18:04,171 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 510950 2023-11-26 13:18:06,434 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3406340.0, ans=0.0 2023-11-26 13:18:07,246 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 5950, loss[loss=0.07315, simple_loss=0.0902, pruned_loss=0.01823, audio_tagging_loss=0.009815, over 15107.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.08861, pruned_loss=0.01203, audio_tagging_loss=0.00886, over 3058031.45 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:18:14,348 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3406340.0, ans=0.2 2023-11-26 13:18:17,259 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.79 vs. limit=12.0 2023-11-26 13:18:44,632 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3406540.0, ans=0.125 2023-11-26 13:18:45,611 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3406540.0, ans=0.125 2023-11-26 13:19:00,370 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 511000 2023-11-26 13:19:01,887 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3406606.6666666665, ans=0.125 2023-11-26 13:19:03,705 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.880e+01 8.762e+01 9.206e+01 9.891e+01 1.298e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-26 13:19:03,734 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 6000, loss[loss=0.04277, simple_loss=0.05085, pruned_loss=0.005771, audio_tagging_loss=0.01158, over 14352.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.08953, pruned_loss=0.0122, audio_tagging_loss=0.008744, over 3056357.77 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:19:03,736 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-26 13:19:25,586 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.3515, 5.0201, 4.6814, 5.1966], device='cuda:0') 2023-11-26 13:19:28,721 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.1954, 4.0193, 3.7700, 3.2959], device='cuda:0') 2023-11-26 13:19:36,330 INFO [train_asr.py:1267] (0/4) Epoch 43, validation: loss=0.05784, simple_loss=0.05057, pruned_loss=0.005191, audio_tagging_loss=0.02736, over 4681554.00 frames. 2023-11-26 13:19:36,330 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-26 13:19:49,488 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.34 vs. limit=15.0 2023-11-26 13:19:50,353 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3406740.0, ans=0.1 2023-11-26 13:20:05,858 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=3406806.6666666665, ans=15.0 2023-11-26 13:20:06,981 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.99 vs. limit=15.0 2023-11-26 13:20:16,710 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3406873.3333333335, ans=0.2 2023-11-26 13:20:17,568 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 13:20:18,759 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3406873.3333333335, ans=0.2 2023-11-26 13:20:24,654 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3406940.0, ans=0.2 2023-11-26 13:20:28,756 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 511050 2023-11-26 13:20:32,350 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 6050, loss[loss=0.06434, simple_loss=0.08316, pruned_loss=0.0117, audio_tagging_loss=0.01106, over 15370.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.08987, pruned_loss=0.01211, audio_tagging_loss=0.008756, over 3057095.46 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:20:57,579 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3407140.0, ans=0.0 2023-11-26 13:20:58,567 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3407140.0, ans=0.125 2023-11-26 13:21:10,583 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3407206.6666666665, ans=0.125 2023-11-26 13:21:14,791 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.67 vs. limit=15.0 2023-11-26 13:21:16,593 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3407273.3333333335, ans=0.125 2023-11-26 13:21:23,753 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 511100 2023-11-26 13:21:27,506 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 6100, loss[loss=0.08226, simple_loss=0.1126, pruned_loss=0.01884, audio_tagging_loss=0.007149, over 15175.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.0903, pruned_loss=0.01222, audio_tagging_loss=0.00876, over 3056310.33 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:21:28,189 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.93 vs. limit=22.5 2023-11-26 13:21:28,499 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.895e+01 8.831e+01 9.526e+01 1.012e+02 1.265e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-26 13:21:37,260 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3407406.6666666665, ans=0.0 2023-11-26 13:21:51,166 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.43 vs. limit=15.0 2023-11-26 13:21:55,589 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3407473.3333333335, ans=0.125 2023-11-26 13:22:11,226 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.67 vs. limit=12.0 2023-11-26 13:22:18,897 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.56 vs. limit=10.0 2023-11-26 13:22:19,345 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 511150 2023-11-26 13:22:19,416 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3407606.6666666665, ans=0.0 2023-11-26 13:22:23,026 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 6150, loss[loss=0.05667, simple_loss=0.07531, pruned_loss=0.009813, audio_tagging_loss=0.009207, over 15292.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.0902, pruned_loss=0.01229, audio_tagging_loss=0.00871, over 3051024.82 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:22:24,740 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.35 vs. limit=15.0 2023-11-26 13:22:26,887 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3407673.3333333335, ans=0.0 2023-11-26 13:22:31,560 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.75 vs. limit=15.0 2023-11-26 13:22:34,342 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3407740.0, ans=10.0 2023-11-26 13:22:42,797 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.34 vs. limit=12.0 2023-11-26 13:22:44,468 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3407806.6666666665, ans=0.125 2023-11-26 13:23:00,555 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3407873.3333333335, ans=0.125 2023-11-26 13:23:15,412 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 511200 2023-11-26 13:23:19,291 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 6200, loss[loss=0.07799, simple_loss=0.1127, pruned_loss=0.01493, audio_tagging_loss=0.006698, over 15019.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08967, pruned_loss=0.01221, audio_tagging_loss=0.00881, over 3050594.52 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:23:20,343 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.166e+01 8.535e+01 9.196e+01 1.004e+02 1.259e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-26 13:23:32,298 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 13:23:39,700 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.28 vs. limit=22.5 2023-11-26 13:23:48,249 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3408140.0, ans=0.125 2023-11-26 13:23:48,365 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3408140.0, ans=0.125 2023-11-26 13:23:50,398 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3408140.0, ans=0.125 2023-11-26 13:24:06,366 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3408273.3333333335, ans=0.125 2023-11-26 13:24:08,943 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.04 vs. limit=12.0 2023-11-26 13:24:11,506 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 511250 2023-11-26 13:24:12,843 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3408273.3333333335, ans=0.0 2023-11-26 13:24:14,670 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 6250, loss[loss=0.05095, simple_loss=0.06332, pruned_loss=0.008472, audio_tagging_loss=0.01082, over 15607.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.09012, pruned_loss=0.01233, audio_tagging_loss=0.00884, over 3048777.47 frames. ], batch size: 61, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:24:23,093 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3408340.0, ans=0.1 2023-11-26 13:24:45,857 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3408473.3333333335, ans=0.0 2023-11-26 13:25:07,745 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 511300 2023-11-26 13:25:10,840 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 6300, loss[loss=0.04707, simple_loss=0.05921, pruned_loss=0.006997, audio_tagging_loss=0.01046, over 15391.00 frames. ], tot_loss[loss=0.06684, simple_loss=0.09077, pruned_loss=0.01259, audio_tagging_loss=0.008869, over 3043820.43 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:25:12,443 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.768e+01 8.818e+01 9.508e+01 1.037e+02 1.214e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-26 13:25:12,748 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3408673.3333333335, ans=0.125 2023-11-26 13:25:29,712 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3408740.0, ans=0.125 2023-11-26 13:25:36,267 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.29 vs. limit=22.5 2023-11-26 13:25:45,709 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3408873.3333333335, ans=0.0 2023-11-26 13:25:57,431 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3408940.0, ans=0.04949747468305833 2023-11-26 13:26:04,127 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 511350 2023-11-26 13:26:07,286 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 6350, loss[loss=0.06304, simple_loss=0.09504, pruned_loss=0.006055, audio_tagging_loss=0.009469, over 15270.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.09045, pruned_loss=0.01251, audio_tagging_loss=0.008955, over 3046114.20 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:26:07,458 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3409006.6666666665, ans=0.125 2023-11-26 13:26:37,942 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3409140.0, ans=0.09899494936611666 2023-11-26 13:26:44,900 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3409206.6666666665, ans=0.0 2023-11-26 13:26:59,928 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 511400 2023-11-26 13:27:00,057 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3409273.3333333335, ans=0.125 2023-11-26 13:27:03,272 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 6400, loss[loss=0.06779, simple_loss=0.09375, pruned_loss=0.01202, audio_tagging_loss=0.008893, over 15577.00 frames. ], tot_loss[loss=0.06667, simple_loss=0.09068, pruned_loss=0.01241, audio_tagging_loss=0.008917, over 3045296.50 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:27:04,273 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.869e+01 8.943e+01 9.499e+01 1.031e+02 1.393e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-26 13:27:29,795 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.59 vs. limit=12.0 2023-11-26 13:27:32,733 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3409473.3333333335, ans=0.125 2023-11-26 13:27:42,138 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.90 vs. limit=15.0 2023-11-26 13:27:44,079 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3409540.0, ans=0.1 2023-11-26 13:27:55,511 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 511450 2023-11-26 13:27:58,656 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 6450, loss[loss=0.06628, simple_loss=0.08023, pruned_loss=0.0133, audio_tagging_loss=0.01287, over 15104.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.08996, pruned_loss=0.01226, audio_tagging_loss=0.009038, over 3043601.27 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:28:09,918 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3409740.0, ans=0.2 2023-11-26 13:28:10,997 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3409740.0, ans=0.125 2023-11-26 13:28:21,430 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.17 vs. limit=15.0 2023-11-26 13:28:40,516 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.25 vs. limit=15.0 2023-11-26 13:28:51,652 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 511500 2023-11-26 13:28:54,730 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 6500, loss[loss=0.07409, simple_loss=0.09296, pruned_loss=0.01831, audio_tagging_loss=0.009301, over 14179.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.08965, pruned_loss=0.01228, audio_tagging_loss=0.009059, over 3042162.85 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:28:55,795 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.162e+01 8.812e+01 9.257e+01 9.962e+01 1.246e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-26 13:28:56,526 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.10 vs. limit=15.0 2023-11-26 13:29:01,807 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3410006.6666666665, ans=0.125 2023-11-26 13:29:09,281 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3410073.3333333335, ans=0.1 2023-11-26 13:29:11,499 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 13:29:12,708 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3410073.3333333335, ans=0.125 2023-11-26 13:29:15,780 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3410140.0, ans=0.1 2023-11-26 13:29:19,096 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3410140.0, ans=0.2 2023-11-26 13:29:34,500 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3410206.6666666665, ans=0.125 2023-11-26 13:29:39,911 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3410273.3333333335, ans=0.125 2023-11-26 13:29:43,598 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3410273.3333333335, ans=0.1 2023-11-26 13:29:43,850 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.79 vs. limit=15.0 2023-11-26 13:29:47,562 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 511550 2023-11-26 13:29:50,641 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 6550, loss[loss=0.06463, simple_loss=0.08169, pruned_loss=0.01326, audio_tagging_loss=0.01053, over 14764.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.08941, pruned_loss=0.01221, audio_tagging_loss=0.008867, over 3044824.31 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:29:50,828 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3410340.0, ans=0.125 2023-11-26 13:30:05,789 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3410406.6666666665, ans=0.0 2023-11-26 13:30:17,075 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3410473.3333333335, ans=0.1 2023-11-26 13:30:20,267 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3410473.3333333335, ans=0.125 2023-11-26 13:30:21,137 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3410473.3333333335, ans=0.1 2023-11-26 13:30:23,451 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3410540.0, ans=0.125 2023-11-26 13:30:26,089 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3410540.0, ans=10.0 2023-11-26 13:30:42,071 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3410606.6666666665, ans=0.125 2023-11-26 13:30:42,154 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3410606.6666666665, ans=0.1 2023-11-26 13:30:42,995 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 511600 2023-11-26 13:30:45,864 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.51 vs. limit=15.0 2023-11-26 13:30:46,406 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 6600, loss[loss=0.04893, simple_loss=0.06546, pruned_loss=0.008165, audio_tagging_loss=0.008034, over 14803.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08912, pruned_loss=0.01231, audio_tagging_loss=0.008776, over 3044928.41 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:30:47,485 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.564e+01 8.677e+01 9.398e+01 1.026e+02 1.405e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-26 13:30:54,985 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.75 vs. limit=6.0 2023-11-26 13:31:17,411 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3410806.6666666665, ans=0.0 2023-11-26 13:31:27,074 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3410873.3333333335, ans=0.0 2023-11-26 13:31:28,086 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3410873.3333333335, ans=0.0 2023-11-26 13:31:39,000 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 511650 2023-11-26 13:31:42,721 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 6650, loss[loss=0.08492, simple_loss=0.1313, pruned_loss=0.01424, audio_tagging_loss=0.005018, over 15163.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08873, pruned_loss=0.01232, audio_tagging_loss=0.00878, over 3044828.22 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:31:44,010 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3411006.6666666665, ans=0.1 2023-11-26 13:31:51,940 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3411006.6666666665, ans=0.09899494936611666 2023-11-26 13:32:13,297 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3411140.0, ans=0.125 2023-11-26 13:32:24,529 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3411206.6666666665, ans=0.125 2023-11-26 13:32:24,554 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3411206.6666666665, ans=0.1 2023-11-26 13:32:28,743 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3411273.3333333335, ans=0.0 2023-11-26 13:32:35,541 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 511700 2023-11-26 13:32:38,681 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 6700, loss[loss=0.08932, simple_loss=0.1291, pruned_loss=0.01769, audio_tagging_loss=0.0071, over 14910.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.08923, pruned_loss=0.01232, audio_tagging_loss=0.008759, over 3033546.37 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:32:40,759 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.817e+01 8.754e+01 9.381e+01 1.004e+02 1.497e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-26 13:32:46,465 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3411340.0, ans=0.04949747468305833 2023-11-26 13:32:51,662 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3411406.6666666665, ans=0.125 2023-11-26 13:33:03,086 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3411473.3333333335, ans=0.125 2023-11-26 13:33:18,776 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.37 vs. limit=15.0 2023-11-26 13:33:25,060 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3411606.6666666665, ans=0.125 2023-11-26 13:33:31,088 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 511750 2023-11-26 13:33:34,181 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 6750, loss[loss=0.08299, simple_loss=0.1059, pruned_loss=0.01857, audio_tagging_loss=0.01147, over 15575.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.08941, pruned_loss=0.01234, audio_tagging_loss=0.008797, over 3030828.09 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:33:36,492 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3411673.3333333335, ans=0.0 2023-11-26 13:33:42,272 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3411673.3333333335, ans=0.125 2023-11-26 13:33:46,657 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3411740.0, ans=0.125 2023-11-26 13:33:53,920 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3411740.0, ans=0.2 2023-11-26 13:34:00,760 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3411806.6666666665, ans=0.0 2023-11-26 13:34:03,947 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 13:34:07,184 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3411873.3333333335, ans=0.0 2023-11-26 13:34:11,430 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3411873.3333333335, ans=0.125 2023-11-26 13:34:17,789 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3411940.0, ans=0.125 2023-11-26 13:34:26,690 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 511800 2023-11-26 13:34:29,232 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3412006.6666666665, ans=0.2 2023-11-26 13:34:30,029 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 6800, loss[loss=0.05524, simple_loss=0.06952, pruned_loss=0.009781, audio_tagging_loss=0.0107, over 14846.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.08962, pruned_loss=0.01239, audio_tagging_loss=0.008779, over 3031199.50 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:34:32,716 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.165e+01 8.841e+01 9.365e+01 1.006e+02 1.409e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-26 13:34:43,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3412073.3333333335, ans=0.0 2023-11-26 13:34:44,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3412073.3333333335, ans=0.1 2023-11-26 13:34:51,187 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3412073.3333333335, ans=0.0 2023-11-26 13:34:52,306 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3412140.0, ans=0.2 2023-11-26 13:34:55,501 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3412140.0, ans=0.125 2023-11-26 13:34:59,926 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3412140.0, ans=0.2 2023-11-26 13:35:24,232 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 511850 2023-11-26 13:35:27,357 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 6850, loss[loss=0.06449, simple_loss=0.09105, pruned_loss=0.01178, audio_tagging_loss=0.007192, over 14974.00 frames. ], tot_loss[loss=0.06646, simple_loss=0.09084, pruned_loss=0.01242, audio_tagging_loss=0.008621, over 3045274.52 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:35:33,940 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3412340.0, ans=0.0 2023-11-26 13:35:38,499 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.11 vs. limit=15.0 2023-11-26 13:35:40,363 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3412406.6666666665, ans=0.125 2023-11-26 13:36:17,375 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3412606.6666666665, ans=0.125 2023-11-26 13:36:19,392 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 511900 2023-11-26 13:36:22,523 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 6900, loss[loss=0.07273, simple_loss=0.1033, pruned_loss=0.01425, audio_tagging_loss=0.006842, over 16040.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.09006, pruned_loss=0.01233, audio_tagging_loss=0.008718, over 3047894.33 frames. ], batch size: 60, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:36:24,652 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.426e+01 8.611e+01 9.198e+01 9.954e+01 1.491e+02, threshold=1.840e+02, percent-clipped=0.0 2023-11-26 13:36:26,979 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3412673.3333333335, ans=0.0 2023-11-26 13:37:01,342 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 13:37:08,634 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 13:37:12,222 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3412940.0, ans=0.0 2023-11-26 13:37:12,256 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3412940.0, ans=0.05 2023-11-26 13:37:15,718 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 511950 2023-11-26 13:37:18,949 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 6950, loss[loss=0.05859, simple_loss=0.08023, pruned_loss=0.01087, audio_tagging_loss=0.007603, over 14913.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.08919, pruned_loss=0.01205, audio_tagging_loss=0.008753, over 3043734.70 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:37:24,200 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3413006.6666666665, ans=0.125 2023-11-26 13:37:26,263 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3413006.6666666665, ans=0.2 2023-11-26 13:37:28,254 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3413006.6666666665, ans=0.125 2023-11-26 13:37:32,286 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3413073.3333333335, ans=0.1 2023-11-26 13:37:45,767 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.08 vs. limit=15.0 2023-11-26 13:37:56,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3413206.6666666665, ans=0.1 2023-11-26 13:38:11,791 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 512000 2023-11-26 13:38:13,151 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-512000.pt 2023-11-26 13:38:17,719 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 7000, loss[loss=0.09116, simple_loss=0.1246, pruned_loss=0.02108, audio_tagging_loss=0.007785, over 14777.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.08964, pruned_loss=0.01221, audio_tagging_loss=0.008745, over 3042445.10 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:38:20,433 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.849e+01 8.732e+01 9.495e+01 1.005e+02 2.082e+02, threshold=1.899e+02, percent-clipped=1.0 2023-11-26 13:38:31,285 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3413406.6666666665, ans=0.1 2023-11-26 13:38:44,662 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3413473.3333333335, ans=0.0 2023-11-26 13:39:10,702 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 512050 2023-11-26 13:39:13,844 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 7050, loss[loss=0.0549, simple_loss=0.07072, pruned_loss=0.01116, audio_tagging_loss=0.008378, over 14893.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.08994, pruned_loss=0.01237, audio_tagging_loss=0.008781, over 3042738.88 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:39:15,107 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3413673.3333333335, ans=0.0 2023-11-26 13:39:27,391 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3413740.0, ans=0.125 2023-11-26 13:39:46,642 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3413873.3333333335, ans=0.125 2023-11-26 13:39:59,884 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=3413940.0, ans=0.5 2023-11-26 13:40:06,137 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 512100 2023-11-26 13:40:09,740 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 7100, loss[loss=0.07309, simple_loss=0.1044, pruned_loss=0.0126, audio_tagging_loss=0.008308, over 15277.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.09008, pruned_loss=0.0123, audio_tagging_loss=0.008806, over 3036064.21 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:40:12,859 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.221e+01 8.711e+01 9.572e+01 1.021e+02 1.655e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-26 13:40:34,977 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3414140.0, ans=0.0 2023-11-26 13:40:50,252 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.78 vs. limit=15.0 2023-11-26 13:41:00,448 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3414273.3333333335, ans=0.0 2023-11-26 13:41:02,424 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 512150 2023-11-26 13:41:04,807 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3414340.0, ans=0.125 2023-11-26 13:41:05,580 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 7150, loss[loss=0.06623, simple_loss=0.09045, pruned_loss=0.01112, audio_tagging_loss=0.00988, over 14121.00 frames. ], tot_loss[loss=0.06644, simple_loss=0.09028, pruned_loss=0.01237, audio_tagging_loss=0.008927, over 3045118.00 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:41:21,915 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3414406.6666666665, ans=0.125 2023-11-26 13:41:49,938 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3414606.6666666665, ans=0.125 2023-11-26 13:41:51,380 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.43 vs. limit=12.0 2023-11-26 13:41:58,338 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 512200 2023-11-26 13:42:02,392 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 7200, loss[loss=0.05711, simple_loss=0.07341, pruned_loss=0.008644, audio_tagging_loss=0.01176, over 15192.00 frames. ], tot_loss[loss=0.06589, simple_loss=0.08941, pruned_loss=0.01222, audio_tagging_loss=0.00897, over 3042459.90 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:42:05,620 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.870e+01 8.947e+01 9.542e+01 1.037e+02 1.437e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-26 13:42:11,417 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.82 vs. limit=10.0 2023-11-26 13:42:12,247 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3414740.0, ans=0.125 2023-11-26 13:42:15,358 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3414740.0, ans=0.125 2023-11-26 13:42:17,446 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.77 vs. limit=15.0 2023-11-26 13:42:19,188 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3414740.0, ans=0.125 2023-11-26 13:42:24,043 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3414806.6666666665, ans=0.1 2023-11-26 13:42:36,623 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.00 vs. limit=10.0 2023-11-26 13:42:39,504 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3414873.3333333335, ans=0.125 2023-11-26 13:42:54,843 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 512250 2023-11-26 13:42:57,949 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 7250, loss[loss=0.07243, simple_loss=0.1039, pruned_loss=0.01169, audio_tagging_loss=0.008804, over 16189.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.09005, pruned_loss=0.01228, audio_tagging_loss=0.008962, over 3047419.96 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:43:08,764 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3415073.3333333335, ans=0.04949747468305833 2023-11-26 13:43:16,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3415073.3333333335, ans=0.125 2023-11-26 13:43:20,233 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.58 vs. limit=22.5 2023-11-26 13:43:26,124 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=3415140.0, ans=0.5 2023-11-26 13:43:34,119 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3415206.6666666665, ans=0.125 2023-11-26 13:43:35,281 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff2.min_abs, batch_count=3415206.6666666665, ans=0.1 2023-11-26 13:43:51,085 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 512300 2023-11-26 13:43:54,218 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 7300, loss[loss=0.06284, simple_loss=0.07851, pruned_loss=0.01375, audio_tagging_loss=0.00983, over 14911.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.09037, pruned_loss=0.01238, audio_tagging_loss=0.008871, over 3043746.20 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:43:54,441 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3415340.0, ans=0.125 2023-11-26 13:43:54,465 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3415340.0, ans=0.1 2023-11-26 13:43:59,020 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.665e+01 8.745e+01 9.385e+01 1.003e+02 1.402e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-26 13:44:13,682 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3415406.6666666665, ans=0.125 2023-11-26 13:44:14,657 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3415406.6666666665, ans=0.0 2023-11-26 13:44:22,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3415473.3333333335, ans=0.125 2023-11-26 13:44:23,603 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.89 vs. limit=15.0 2023-11-26 13:44:27,396 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.14 vs. limit=15.0 2023-11-26 13:44:36,226 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3415540.0, ans=0.1 2023-11-26 13:44:38,392 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3415606.6666666665, ans=0.2 2023-11-26 13:44:47,373 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 512350 2023-11-26 13:44:48,603 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3415606.6666666665, ans=0.125 2023-11-26 13:44:50,500 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 7350, loss[loss=0.04864, simple_loss=0.06688, pruned_loss=0.006637, audio_tagging_loss=0.008564, over 14835.00 frames. ], tot_loss[loss=0.06704, simple_loss=0.09138, pruned_loss=0.01263, audio_tagging_loss=0.008722, over 3045957.49 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:44:53,264 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.34 vs. limit=22.5 2023-11-26 13:44:54,297 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.27 vs. limit=15.0 2023-11-26 13:45:16,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3415806.6666666665, ans=0.0 2023-11-26 13:45:27,176 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3415873.3333333335, ans=0.05 2023-11-26 13:45:43,393 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 512400 2023-11-26 13:45:43,561 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3415940.0, ans=0.0 2023-11-26 13:45:46,015 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3416006.6666666665, ans=0.0 2023-11-26 13:45:46,739 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 7400, loss[loss=0.05979, simple_loss=0.07816, pruned_loss=0.01298, audio_tagging_loss=0.007729, over 15216.00 frames. ], tot_loss[loss=0.0663, simple_loss=0.09033, pruned_loss=0.01249, audio_tagging_loss=0.008643, over 3047083.98 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:45:50,936 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.547e+01 8.921e+01 9.521e+01 1.008e+02 1.264e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-26 13:45:51,155 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3416006.6666666665, ans=0.125 2023-11-26 13:45:58,099 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.43 vs. limit=15.0 2023-11-26 13:46:02,139 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.96 vs. limit=10.0 2023-11-26 13:46:08,877 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3416140.0, ans=0.0 2023-11-26 13:46:18,632 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.66 vs. limit=15.0 2023-11-26 13:46:27,445 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3416206.6666666665, ans=0.125 2023-11-26 13:46:40,210 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 512450 2023-11-26 13:46:43,237 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 7450, loss[loss=0.06829, simple_loss=0.09315, pruned_loss=0.01361, audio_tagging_loss=0.008104, over 14827.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.09022, pruned_loss=0.01254, audio_tagging_loss=0.008624, over 3050954.17 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:46:50,221 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3416340.0, ans=0.0 2023-11-26 13:47:10,075 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3416473.3333333335, ans=0.0 2023-11-26 13:47:31,999 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.18 vs. limit=6.0 2023-11-26 13:47:34,959 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3416606.6666666665, ans=0.125 2023-11-26 13:47:35,963 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 512500 2023-11-26 13:47:39,128 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 7500, loss[loss=0.0661, simple_loss=0.1002, pruned_loss=0.0107, audio_tagging_loss=0.005322, over 16777.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.0904, pruned_loss=0.01251, audio_tagging_loss=0.008629, over 3058055.31 frames. ], batch size: 61, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:47:42,484 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3416673.3333333335, ans=0.0 2023-11-26 13:47:43,323 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.533e+01 8.713e+01 9.201e+01 9.904e+01 1.159e+02, threshold=1.840e+02, percent-clipped=0.0 2023-11-26 13:47:43,618 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3416673.3333333335, ans=0.125 2023-11-26 13:47:44,672 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3416673.3333333335, ans=0.0 2023-11-26 13:48:06,745 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.16 vs. limit=22.5 2023-11-26 13:48:16,871 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.90 vs. limit=22.5 2023-11-26 13:48:28,293 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3416940.0, ans=0.125 2023-11-26 13:48:31,297 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 512550 2023-11-26 13:48:34,365 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 7550, loss[loss=0.07276, simple_loss=0.1051, pruned_loss=0.01273, audio_tagging_loss=0.007497, over 14483.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.09051, pruned_loss=0.0125, audio_tagging_loss=0.008591, over 3061310.72 frames. ], batch size: 53, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:48:37,284 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3417006.6666666665, ans=0.025 2023-11-26 13:48:53,789 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.12 vs. limit=15.0 2023-11-26 13:48:55,933 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3417073.3333333335, ans=0.0 2023-11-26 13:49:06,692 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3417140.0, ans=0.0 2023-11-26 13:49:20,470 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.01 vs. limit=15.0 2023-11-26 13:49:21,281 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3417273.3333333335, ans=0.125 2023-11-26 13:49:22,591 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.91 vs. limit=22.5 2023-11-26 13:49:27,947 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 512600 2023-11-26 13:49:31,663 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 7600, loss[loss=0.05294, simple_loss=0.07008, pruned_loss=0.009227, audio_tagging_loss=0.008674, over 15462.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.08983, pruned_loss=0.01223, audio_tagging_loss=0.008575, over 3058913.01 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:49:35,804 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.220e+01 8.691e+01 9.310e+01 9.815e+01 1.310e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-26 13:49:51,473 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3417406.6666666665, ans=0.125 2023-11-26 13:50:00,418 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.08 vs. limit=15.0 2023-11-26 13:50:04,081 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.75 vs. limit=15.0 2023-11-26 13:50:06,020 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3417540.0, ans=0.125 2023-11-26 13:50:22,433 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3417606.6666666665, ans=0.125 2023-11-26 13:50:24,319 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 512650 2023-11-26 13:50:25,495 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3417606.6666666665, ans=0.0 2023-11-26 13:50:27,356 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 7650, loss[loss=0.05831, simple_loss=0.07769, pruned_loss=0.01133, audio_tagging_loss=0.00813, over 15174.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.08898, pruned_loss=0.01205, audio_tagging_loss=0.008619, over 3056319.85 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:50:32,007 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.59 vs. limit=22.5 2023-11-26 13:50:48,401 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=3417806.6666666665, ans=0.95 2023-11-26 13:50:50,553 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3417806.6666666665, ans=0.125 2023-11-26 13:50:54,305 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3417806.6666666665, ans=0.125 2023-11-26 13:51:08,059 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3417873.3333333335, ans=0.125 2023-11-26 13:51:10,234 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3417873.3333333335, ans=0.125 2023-11-26 13:51:19,497 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 512700 2023-11-26 13:51:22,618 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 7700, loss[loss=0.05856, simple_loss=0.07784, pruned_loss=0.009241, audio_tagging_loss=0.0104, over 14415.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.08887, pruned_loss=0.01199, audio_tagging_loss=0.008655, over 3047071.00 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:51:26,877 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.358e+01 8.781e+01 9.620e+01 1.045e+02 1.417e+02, threshold=1.924e+02, percent-clipped=0.0 2023-11-26 13:51:30,244 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3418006.6666666665, ans=0.125 2023-11-26 13:51:30,269 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3418006.6666666665, ans=0.1 2023-11-26 13:51:42,835 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3418073.3333333335, ans=0.2 2023-11-26 13:51:51,737 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3418140.0, ans=0.0 2023-11-26 13:52:03,514 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 13:52:16,045 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 512750 2023-11-26 13:52:19,055 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 7750, loss[loss=0.05171, simple_loss=0.06272, pruned_loss=0.009735, audio_tagging_loss=0.01062, over 14714.00 frames. ], tot_loss[loss=0.06511, simple_loss=0.08857, pruned_loss=0.01207, audio_tagging_loss=0.008757, over 3042533.02 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:52:37,851 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.56 vs. limit=5.0 2023-11-26 13:52:55,291 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3418540.0, ans=0.04949747468305833 2023-11-26 13:53:01,722 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3418540.0, ans=0.0 2023-11-26 13:53:11,037 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3418606.6666666665, ans=0.1 2023-11-26 13:53:12,074 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 512800 2023-11-26 13:53:14,994 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.55 vs. limit=15.0 2023-11-26 13:53:15,453 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 7800, loss[loss=0.06318, simple_loss=0.08719, pruned_loss=0.01066, audio_tagging_loss=0.008926, over 14953.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08896, pruned_loss=0.01228, audio_tagging_loss=0.008758, over 3045459.89 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:53:19,689 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.359e+01 9.103e+01 9.758e+01 1.031e+02 1.342e+02, threshold=1.952e+02, percent-clipped=0.0 2023-11-26 13:53:24,107 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3418673.3333333335, ans=0.0 2023-11-26 13:53:28,818 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.65 vs. limit=10.0 2023-11-26 13:54:07,484 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 512850 2023-11-26 13:54:10,519 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 7850, loss[loss=0.05846, simple_loss=0.08289, pruned_loss=0.007792, audio_tagging_loss=0.009218, over 15276.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08916, pruned_loss=0.01234, audio_tagging_loss=0.008845, over 3045210.77 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:54:11,086 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.04 vs. limit=22.5 2023-11-26 13:54:21,606 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.79 vs. limit=15.0 2023-11-26 13:54:30,503 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3419073.3333333335, ans=0.125 2023-11-26 13:54:43,400 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.11 vs. limit=15.0 2023-11-26 13:54:53,468 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3419206.6666666665, ans=0.125 2023-11-26 13:55:01,942 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3419273.3333333335, ans=0.125 2023-11-26 13:55:02,780 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 512900 2023-11-26 13:55:02,864 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3419273.3333333335, ans=0.0 2023-11-26 13:55:06,486 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 7900, loss[loss=0.0561, simple_loss=0.07048, pruned_loss=0.01102, audio_tagging_loss=0.009838, over 14457.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.09, pruned_loss=0.01253, audio_tagging_loss=0.008808, over 3050097.62 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:55:06,683 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3419340.0, ans=0.125 2023-11-26 13:55:12,351 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.382e+01 9.008e+01 9.612e+01 1.015e+02 1.376e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-26 13:55:25,020 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.16 vs. limit=22.5 2023-11-26 13:55:43,054 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3419540.0, ans=0.0 2023-11-26 13:55:59,426 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 512950 2023-11-26 13:56:03,114 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 7950, loss[loss=0.0493, simple_loss=0.07133, pruned_loss=0.005918, audio_tagging_loss=0.007723, over 14499.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.0902, pruned_loss=0.01258, audio_tagging_loss=0.008832, over 3055410.88 frames. ], batch size: 53, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:56:11,142 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.20 vs. limit=15.0 2023-11-26 13:56:17,931 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 13:56:26,441 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3419806.6666666665, ans=0.1 2023-11-26 13:56:36,533 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.43 vs. limit=15.0 2023-11-26 13:56:40,727 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3419873.3333333335, ans=0.1 2023-11-26 13:56:49,903 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3419940.0, ans=0.125 2023-11-26 13:56:54,995 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 513000 2023-11-26 13:56:58,393 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 8000, loss[loss=0.07232, simple_loss=0.09806, pruned_loss=0.01376, audio_tagging_loss=0.009531, over 14128.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.08984, pruned_loss=0.01253, audio_tagging_loss=0.008971, over 3047926.38 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:57:03,736 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.322e+01 8.642e+01 9.203e+01 9.908e+01 1.245e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-26 13:57:39,491 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3420206.6666666665, ans=0.125 2023-11-26 13:57:50,843 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 513050 2023-11-26 13:57:54,565 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 8050, loss[loss=0.08559, simple_loss=0.1198, pruned_loss=0.01842, audio_tagging_loss=0.007258, over 14628.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.08854, pruned_loss=0.01228, audio_tagging_loss=0.009148, over 3044658.24 frames. ], batch size: 53, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:58:33,971 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3420540.0, ans=0.04949747468305833 2023-11-26 13:58:41,159 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.65 vs. limit=15.0 2023-11-26 13:58:46,629 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 513100 2023-11-26 13:58:50,324 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 8100, loss[loss=0.09571, simple_loss=0.1329, pruned_loss=0.02229, audio_tagging_loss=0.006972, over 16554.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.0896, pruned_loss=0.01246, audio_tagging_loss=0.009099, over 3043147.79 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:58:56,152 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.455e+01 8.587e+01 9.236e+01 9.771e+01 1.279e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-26 13:59:00,728 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3420740.0, ans=0.0 2023-11-26 13:59:04,967 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3420740.0, ans=0.1 2023-11-26 13:59:11,560 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.15 vs. limit=15.0 2023-11-26 13:59:36,964 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3420940.0, ans=0.125 2023-11-26 13:59:43,033 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 513150 2023-11-26 13:59:46,085 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 8150, loss[loss=0.05844, simple_loss=0.07746, pruned_loss=0.01072, audio_tagging_loss=0.008984, over 14248.00 frames. ], tot_loss[loss=0.06644, simple_loss=0.08997, pruned_loss=0.01257, audio_tagging_loss=0.008876, over 3039935.47 frames. ], batch size: 53, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:59:58,799 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3421073.3333333335, ans=0.125 2023-11-26 14:00:02,558 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3421073.3333333335, ans=0.0 2023-11-26 14:00:34,809 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3421273.3333333335, ans=0.0 2023-11-26 14:00:37,941 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 513200 2023-11-26 14:00:38,404 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.09 vs. limit=15.0 2023-11-26 14:00:41,358 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 8200, loss[loss=0.04671, simple_loss=0.06105, pruned_loss=0.007925, audio_tagging_loss=0.00826, over 14600.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.08954, pruned_loss=0.0123, audio_tagging_loss=0.00876, over 3043051.69 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 14:00:41,598 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3421340.0, ans=0.0 2023-11-26 14:00:43,497 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 14:00:46,311 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3421340.0, ans=0.125 2023-11-26 14:00:47,123 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.432e+01 8.881e+01 9.462e+01 1.023e+02 1.490e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-26 14:00:53,210 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3421406.6666666665, ans=0.2 2023-11-26 14:00:53,256 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3421406.6666666665, ans=0.0 2023-11-26 14:01:08,828 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.81 vs. limit=10.0 2023-11-26 14:01:12,518 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3421473.3333333335, ans=0.0 2023-11-26 14:01:26,436 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=3421606.6666666665, ans=0.2 2023-11-26 14:01:29,489 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.17 vs. limit=22.5 2023-11-26 14:01:30,076 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3421606.6666666665, ans=0.0 2023-11-26 14:01:34,725 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 513250 2023-11-26 14:01:37,868 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 8250, loss[loss=0.0807, simple_loss=0.114, pruned_loss=0.0162, audio_tagging_loss=0.007496, over 15802.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08934, pruned_loss=0.01214, audio_tagging_loss=0.008689, over 3044838.07 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 14:02:10,630 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3421873.3333333335, ans=0.125 2023-11-26 14:02:30,603 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 513300 2023-11-26 14:02:33,509 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3422006.6666666665, ans=0.1 2023-11-26 14:02:34,281 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 8300, loss[loss=0.07427, simple_loss=0.09941, pruned_loss=0.01692, audio_tagging_loss=0.007649, over 15380.00 frames. ], tot_loss[loss=0.06487, simple_loss=0.08858, pruned_loss=0.01197, audio_tagging_loss=0.008614, over 3047574.04 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:02:40,662 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.251e+01 8.839e+01 9.487e+01 1.004e+02 1.588e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-26 14:03:02,015 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3422140.0, ans=0.035 2023-11-26 14:03:02,040 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3422140.0, ans=0.04949747468305833 2023-11-26 14:03:02,182 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3422140.0, ans=0.0 2023-11-26 14:03:04,044 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3422140.0, ans=0.0 2023-11-26 14:03:16,918 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3422206.6666666665, ans=0.1 2023-11-26 14:03:19,070 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3422273.3333333335, ans=0.05 2023-11-26 14:03:26,289 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 513350 2023-11-26 14:03:29,407 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 8350, loss[loss=0.07627, simple_loss=0.1039, pruned_loss=0.01504, audio_tagging_loss=0.009259, over 16496.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08933, pruned_loss=0.01211, audio_tagging_loss=0.008583, over 3058441.65 frames. ], batch size: 61, lr: 1.57e-03, grad_scale: 8.0 2023-11-26 14:03:38,572 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3422340.0, ans=0.125 2023-11-26 14:03:43,314 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3422406.6666666665, ans=0.125 2023-11-26 14:03:43,347 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3422406.6666666665, ans=0.125 2023-11-26 14:04:22,054 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 513400 2023-11-26 14:04:25,970 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 8400, loss[loss=0.08724, simple_loss=0.1228, pruned_loss=0.01839, audio_tagging_loss=0.007469, over 14899.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.08978, pruned_loss=0.01227, audio_tagging_loss=0.008598, over 3058178.41 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:04:30,828 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3422673.3333333335, ans=0.2 2023-11-26 14:04:33,773 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.236e+01 8.557e+01 9.224e+01 9.865e+01 1.202e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-26 14:04:37,234 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3422740.0, ans=0.1 2023-11-26 14:04:50,294 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3422806.6666666665, ans=0.0 2023-11-26 14:05:03,618 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3422873.3333333335, ans=0.125 2023-11-26 14:05:04,961 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.43 vs. limit=15.0 2023-11-26 14:05:18,111 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 513450 2023-11-26 14:05:18,698 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.34 vs. limit=15.0 2023-11-26 14:05:21,188 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 8450, loss[loss=0.08122, simple_loss=0.103, pruned_loss=0.01823, audio_tagging_loss=0.0115, over 15163.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.09008, pruned_loss=0.01243, audio_tagging_loss=0.008669, over 3059083.74 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:05:29,303 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3423006.6666666665, ans=0.125 2023-11-26 14:05:40,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3423073.3333333335, ans=0.125 2023-11-26 14:06:13,884 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 513500 2023-11-26 14:06:16,997 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 8500, loss[loss=0.0659, simple_loss=0.08299, pruned_loss=0.01498, audio_tagging_loss=0.00943, over 14280.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.09046, pruned_loss=0.01238, audio_tagging_loss=0.008736, over 3058194.95 frames. ], batch size: 52, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:06:17,475 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.49 vs. limit=15.0 2023-11-26 14:06:24,288 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.353e+01 8.916e+01 9.646e+01 1.037e+02 1.510e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-26 14:06:56,512 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3423540.0, ans=0.0 2023-11-26 14:07:04,200 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.79 vs. limit=15.0 2023-11-26 14:07:05,065 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3423606.6666666665, ans=0.125 2023-11-26 14:07:09,549 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 513550 2023-11-26 14:07:12,639 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 8550, loss[loss=0.05656, simple_loss=0.08118, pruned_loss=0.00831, audio_tagging_loss=0.007657, over 14821.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.09044, pruned_loss=0.01229, audio_tagging_loss=0.008732, over 3046795.20 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:07:19,077 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.99 vs. limit=10.0 2023-11-26 14:07:31,902 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3423740.0, ans=0.0 2023-11-26 14:07:33,143 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.17 vs. limit=15.0 2023-11-26 14:07:46,242 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3423873.3333333335, ans=0.125 2023-11-26 14:08:03,943 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3423940.0, ans=0.0 2023-11-26 14:08:05,496 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.43 vs. limit=6.0 2023-11-26 14:08:05,907 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 513600 2023-11-26 14:08:09,287 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 8600, loss[loss=0.06909, simple_loss=0.09883, pruned_loss=0.01235, audio_tagging_loss=0.007326, over 15773.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.08968, pruned_loss=0.01243, audio_tagging_loss=0.008768, over 3038520.64 frames. ], batch size: 60, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:08:16,720 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.106e+01 8.755e+01 9.267e+01 1.010e+02 1.487e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-26 14:08:24,915 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3424073.3333333335, ans=0.0 2023-11-26 14:08:43,254 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.92 vs. limit=22.5 2023-11-26 14:08:50,062 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3424206.6666666665, ans=0.125 2023-11-26 14:08:50,213 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.89 vs. limit=22.5 2023-11-26 14:09:01,411 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 513650 2023-11-26 14:09:05,089 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 8650, loss[loss=0.0692, simple_loss=0.09735, pruned_loss=0.01075, audio_tagging_loss=0.009773, over 16266.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.08996, pruned_loss=0.01231, audio_tagging_loss=0.00879, over 3040588.70 frames. ], batch size: 60, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:09:16,191 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3424406.6666666665, ans=0.125 2023-11-26 14:09:34,223 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3424473.3333333335, ans=0.1 2023-11-26 14:09:43,712 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=3424540.0, ans=0.5 2023-11-26 14:09:43,742 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3424540.0, ans=0.125 2023-11-26 14:09:56,938 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 513700 2023-11-26 14:10:00,545 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 8700, loss[loss=0.06737, simple_loss=0.08991, pruned_loss=0.01622, audio_tagging_loss=0.006204, over 15114.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.08925, pruned_loss=0.01244, audio_tagging_loss=0.008886, over 3040132.58 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:10:06,459 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3424673.3333333335, ans=0.125 2023-11-26 14:10:08,468 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.016e+01 8.732e+01 9.410e+01 1.013e+02 1.633e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-26 14:10:11,461 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3424740.0, ans=0.1 2023-11-26 14:10:16,074 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.57 vs. limit=6.0 2023-11-26 14:10:46,508 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3424940.0, ans=0.125 2023-11-26 14:10:53,877 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 513750 2023-11-26 14:10:57,011 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 8750, loss[loss=0.08361, simple_loss=0.1183, pruned_loss=0.01664, audio_tagging_loss=0.007816, over 14818.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.09027, pruned_loss=0.01263, audio_tagging_loss=0.008896, over 3043682.33 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:10:57,171 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3425006.6666666665, ans=0.125 2023-11-26 14:11:06,771 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3425073.3333333335, ans=0.1 2023-11-26 14:11:08,892 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3425073.3333333335, ans=0.2 2023-11-26 14:11:29,178 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3425206.6666666665, ans=0.1 2023-11-26 14:11:47,141 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3425273.3333333335, ans=0.1 2023-11-26 14:11:49,023 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 513800 2023-11-26 14:11:52,411 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 8800, loss[loss=0.06171, simple_loss=0.08247, pruned_loss=0.01156, audio_tagging_loss=0.008918, over 15373.00 frames. ], tot_loss[loss=0.06674, simple_loss=0.09076, pruned_loss=0.01253, audio_tagging_loss=0.00883, over 3047950.24 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 14:12:00,265 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.281e+01 8.965e+01 9.351e+01 9.840e+01 1.391e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-26 14:12:08,553 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3425406.6666666665, ans=0.1 2023-11-26 14:12:11,604 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3425406.6666666665, ans=0.1 2023-11-26 14:12:14,289 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3425473.3333333335, ans=0.0 2023-11-26 14:12:21,072 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3425473.3333333335, ans=0.125 2023-11-26 14:12:31,635 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3425540.0, ans=0.0 2023-11-26 14:12:31,793 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3425540.0, ans=0.125 2023-11-26 14:12:33,743 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 14:12:44,786 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 513850 2023-11-26 14:12:48,465 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 8850, loss[loss=0.06903, simple_loss=0.09863, pruned_loss=0.01159, audio_tagging_loss=0.008128, over 14746.00 frames. ], tot_loss[loss=0.0664, simple_loss=0.0902, pruned_loss=0.01244, audio_tagging_loss=0.008861, over 3046133.24 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 14:12:50,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3425673.3333333335, ans=0.125 2023-11-26 14:13:01,196 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 14:13:14,144 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3425806.6666666665, ans=0.125 2023-11-26 14:13:15,143 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3425806.6666666665, ans=0.0 2023-11-26 14:13:16,261 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3425806.6666666665, ans=0.125 2023-11-26 14:13:28,122 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff3.min_abs, batch_count=3425873.3333333335, ans=0.2 2023-11-26 14:13:39,222 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.35 vs. limit=22.5 2023-11-26 14:13:40,752 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 513900 2023-11-26 14:13:44,392 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 8900, loss[loss=0.04633, simple_loss=0.05845, pruned_loss=0.009488, audio_tagging_loss=0.00762, over 15008.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.09039, pruned_loss=0.01242, audio_tagging_loss=0.008814, over 3053407.96 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:13:52,702 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.145e+01 8.670e+01 9.413e+01 1.057e+02 1.382e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-26 14:14:36,301 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 513950 2023-11-26 14:14:36,475 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3426273.3333333335, ans=0.125 2023-11-26 14:14:39,372 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 8950, loss[loss=0.07013, simple_loss=0.1025, pruned_loss=0.01301, audio_tagging_loss=0.005865, over 15161.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.09058, pruned_loss=0.01242, audio_tagging_loss=0.008643, over 3047807.19 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:14:57,128 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.19 vs. limit=10.0 2023-11-26 14:15:07,640 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3426473.3333333335, ans=0.0 2023-11-26 14:15:16,045 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3426540.0, ans=0.125 2023-11-26 14:15:30,703 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 514000 2023-11-26 14:15:34,073 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 9000, loss[loss=0.07011, simple_loss=0.08826, pruned_loss=0.01588, audio_tagging_loss=0.0101, over 14837.00 frames. ], tot_loss[loss=0.06673, simple_loss=0.09111, pruned_loss=0.0126, audio_tagging_loss=0.008584, over 3051218.36 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:15:34,075 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-26 14:15:49,352 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([2.8913, 2.5570, 2.2966, 2.6604, 2.4017, 2.4611, 2.4192, 2.4993], device='cuda:0') 2023-11-26 14:15:57,877 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.9498, 1.6440, 3.5229, 3.1069, 2.9361, 3.1493, 3.0938, 3.2311], device='cuda:0') 2023-11-26 14:16:06,632 INFO [train_asr.py:1267] (0/4) Epoch 43, validation: loss=0.05882, simple_loss=0.0506, pruned_loss=0.005335, audio_tagging_loss=0.02819, over 4681554.00 frames. 2023-11-26 14:16:06,633 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-26 14:16:12,669 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.77 vs. limit=22.5 2023-11-26 14:16:14,371 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3426673.3333333335, ans=0.125 2023-11-26 14:16:15,120 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.126e+01 8.986e+01 9.503e+01 1.043e+02 1.217e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-26 14:16:29,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3426806.6666666665, ans=0.0 2023-11-26 14:16:42,651 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3426873.3333333335, ans=0.0 2023-11-26 14:16:46,332 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3426873.3333333335, ans=0.0 2023-11-26 14:16:56,894 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3426940.0, ans=0.125 2023-11-26 14:16:58,880 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 514050 2023-11-26 14:17:01,111 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3427006.6666666665, ans=0.125 2023-11-26 14:17:01,969 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 9050, loss[loss=0.06587, simple_loss=0.0971, pruned_loss=0.008801, audio_tagging_loss=0.008517, over 15583.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.09107, pruned_loss=0.01252, audio_tagging_loss=0.008527, over 3046293.90 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:17:06,356 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3427006.6666666665, ans=0.0 2023-11-26 14:17:34,028 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3427140.0, ans=0.125 2023-11-26 14:17:40,558 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3427206.6666666665, ans=0.125 2023-11-26 14:17:54,115 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 514100 2023-11-26 14:17:57,287 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.09 vs. limit=22.5 2023-11-26 14:17:57,790 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 9100, loss[loss=0.06116, simple_loss=0.08867, pruned_loss=0.01278, audio_tagging_loss=0.004037, over 15091.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.0905, pruned_loss=0.01256, audio_tagging_loss=0.008453, over 3044644.97 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:18:00,453 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.70 vs. limit=10.0 2023-11-26 14:18:07,454 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.029e+01 8.883e+01 9.542e+01 1.028e+02 1.451e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-26 14:18:12,382 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3427406.6666666665, ans=0.2 2023-11-26 14:18:17,071 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=3427406.6666666665, ans=15.0 2023-11-26 14:18:20,059 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3427473.3333333335, ans=0.125 2023-11-26 14:18:22,359 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.17 vs. limit=22.5 2023-11-26 14:18:31,878 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3427540.0, ans=0.07 2023-11-26 14:18:32,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3427540.0, ans=0.0 2023-11-26 14:18:51,491 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 514150 2023-11-26 14:18:55,201 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 9150, loss[loss=0.06534, simple_loss=0.08049, pruned_loss=0.01174, audio_tagging_loss=0.01336, over 14971.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.08968, pruned_loss=0.01248, audio_tagging_loss=0.008558, over 3042120.73 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:18:57,483 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3427673.3333333335, ans=0.125 2023-11-26 14:19:05,368 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.67 vs. limit=6.0 2023-11-26 14:19:18,378 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.10 vs. limit=15.0 2023-11-26 14:19:46,764 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 514200 2023-11-26 14:19:48,522 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.18 vs. limit=22.5 2023-11-26 14:19:50,129 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 9200, loss[loss=0.08724, simple_loss=0.1157, pruned_loss=0.02018, audio_tagging_loss=0.009225, over 15266.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.08944, pruned_loss=0.01242, audio_tagging_loss=0.008626, over 3045800.33 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 14:19:58,737 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.474e+01 8.729e+01 9.387e+01 1.004e+02 1.309e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-26 14:20:04,896 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3428073.3333333335, ans=0.09899494936611666 2023-11-26 14:20:42,627 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 514250 2023-11-26 14:20:42,944 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.56 vs. limit=15.0 2023-11-26 14:20:45,752 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 9250, loss[loss=0.0905, simple_loss=0.1361, pruned_loss=0.01545, audio_tagging_loss=0.007018, over 15839.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.0885, pruned_loss=0.01217, audio_tagging_loss=0.008747, over 3045631.37 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 14:20:49,711 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3428340.0, ans=0.125 2023-11-26 14:21:03,988 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3428406.6666666665, ans=0.125 2023-11-26 14:21:07,331 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3428406.6666666665, ans=0.125 2023-11-26 14:21:10,402 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3428473.3333333335, ans=0.125 2023-11-26 14:21:30,079 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3428606.6666666665, ans=0.05 2023-11-26 14:21:35,706 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.99 vs. limit=15.0 2023-11-26 14:21:36,832 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.10 vs. limit=15.0 2023-11-26 14:21:38,504 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 514300 2023-11-26 14:21:42,682 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 9300, loss[loss=0.08964, simple_loss=0.1275, pruned_loss=0.0192, audio_tagging_loss=0.006696, over 15111.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.08906, pruned_loss=0.0123, audio_tagging_loss=0.008741, over 3048342.85 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 14:21:45,131 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3428673.3333333335, ans=0.125 2023-11-26 14:21:51,813 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.291e+01 8.838e+01 9.271e+01 1.020e+02 1.264e+02, threshold=1.854e+02, percent-clipped=0.0 2023-11-26 14:21:56,713 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.65 vs. limit=15.0 2023-11-26 14:22:14,900 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 14:22:20,292 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3428873.3333333335, ans=0.125 2023-11-26 14:22:35,728 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 514350 2023-11-26 14:22:35,875 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3428940.0, ans=0.125 2023-11-26 14:22:35,953 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3428940.0, ans=0.2 2023-11-26 14:22:38,795 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 9350, loss[loss=0.07859, simple_loss=0.1028, pruned_loss=0.01963, audio_tagging_loss=0.007569, over 14499.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.08928, pruned_loss=0.01232, audio_tagging_loss=0.008777, over 3046043.27 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 14:22:39,404 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.38 vs. limit=15.0 2023-11-26 14:22:44,796 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.87 vs. limit=10.0 2023-11-26 14:22:46,325 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3429006.6666666665, ans=0.0 2023-11-26 14:22:46,507 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3429006.6666666665, ans=0.04949747468305833 2023-11-26 14:22:47,364 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3429006.6666666665, ans=0.0 2023-11-26 14:22:58,715 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3429073.3333333335, ans=0.125 2023-11-26 14:22:58,734 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3429073.3333333335, ans=0.0 2023-11-26 14:22:58,780 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3429073.3333333335, ans=0.125 2023-11-26 14:23:28,515 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.89 vs. limit=10.0 2023-11-26 14:23:30,923 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 514400 2023-11-26 14:23:32,858 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.92 vs. limit=22.5 2023-11-26 14:23:34,340 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 9400, loss[loss=0.08226, simple_loss=0.1183, pruned_loss=0.01443, audio_tagging_loss=0.008662, over 16028.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.08895, pruned_loss=0.01216, audio_tagging_loss=0.008829, over 3043517.87 frames. ], batch size: 61, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:23:44,423 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.222e+01 8.705e+01 9.718e+01 1.044e+02 1.326e+02, threshold=1.944e+02, percent-clipped=0.0 2023-11-26 14:23:52,346 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3429406.6666666665, ans=0.125 2023-11-26 14:24:00,224 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3429473.3333333335, ans=0.1 2023-11-26 14:24:01,289 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3429473.3333333335, ans=0.125 2023-11-26 14:24:09,704 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3429540.0, ans=0.125 2023-11-26 14:24:18,348 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3429606.6666666665, ans=0.5 2023-11-26 14:24:27,202 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 514450 2023-11-26 14:24:30,859 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 9450, loss[loss=0.06331, simple_loss=0.08502, pruned_loss=0.01137, audio_tagging_loss=0.009425, over 14371.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.08954, pruned_loss=0.01221, audio_tagging_loss=0.008857, over 3045592.91 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:24:30,905 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 14:24:56,558 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3429806.6666666665, ans=0.1 2023-11-26 14:25:07,342 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.55 vs. limit=15.0 2023-11-26 14:25:17,647 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3429940.0, ans=0.125 2023-11-26 14:25:23,935 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 514500 2023-11-26 14:25:24,118 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3429940.0, ans=0.125 2023-11-26 14:25:27,577 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 9500, loss[loss=0.06604, simple_loss=0.09482, pruned_loss=0.00952, audio_tagging_loss=0.009109, over 14658.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.08967, pruned_loss=0.01221, audio_tagging_loss=0.008981, over 3039740.03 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:25:37,199 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.279e+01 8.962e+01 9.623e+01 1.023e+02 1.293e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-26 14:25:49,630 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3430140.0, ans=0.125 2023-11-26 14:25:51,030 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.82 vs. limit=22.5 2023-11-26 14:26:01,242 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3430206.6666666665, ans=0.0 2023-11-26 14:26:13,513 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3430273.3333333335, ans=0.2 2023-11-26 14:26:13,514 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3430273.3333333335, ans=0.125 2023-11-26 14:26:19,713 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 514550 2023-11-26 14:26:22,797 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 9550, loss[loss=0.06002, simple_loss=0.08329, pruned_loss=0.01057, audio_tagging_loss=0.007804, over 15102.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.08878, pruned_loss=0.01215, audio_tagging_loss=0.009033, over 3038659.10 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:26:22,964 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3430340.0, ans=0.125 2023-11-26 14:26:35,428 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 14:26:39,448 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.08 vs. limit=15.0 2023-11-26 14:26:51,334 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3430473.3333333335, ans=0.125 2023-11-26 14:26:53,426 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3430473.3333333335, ans=0.0 2023-11-26 14:27:03,446 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3430540.0, ans=0.1 2023-11-26 14:27:15,633 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 514600 2023-11-26 14:27:19,025 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 9600, loss[loss=0.09245, simple_loss=0.1277, pruned_loss=0.02071, audio_tagging_loss=0.007875, over 14191.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.08929, pruned_loss=0.0123, audio_tagging_loss=0.009036, over 3041807.75 frames. ], batch size: 53, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 14:27:27,785 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3430673.3333333335, ans=0.125 2023-11-26 14:27:29,646 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.491e+01 8.862e+01 9.478e+01 1.011e+02 2.091e+02, threshold=1.896e+02, percent-clipped=1.0 2023-11-26 14:27:43,358 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.72 vs. limit=22.5 2023-11-26 14:27:46,325 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3430806.6666666665, ans=0.125 2023-11-26 14:27:57,158 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.70 vs. limit=15.0 2023-11-26 14:28:01,926 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3430873.3333333335, ans=0.125 2023-11-26 14:28:04,763 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3430940.0, ans=0.0 2023-11-26 14:28:11,791 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3430940.0, ans=0.125 2023-11-26 14:28:12,596 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 514650 2023-11-26 14:28:13,004 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.97 vs. limit=15.0 2023-11-26 14:28:15,753 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 9650, loss[loss=0.05618, simple_loss=0.07773, pruned_loss=0.008021, audio_tagging_loss=0.009289, over 15615.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.08892, pruned_loss=0.01222, audio_tagging_loss=0.009028, over 3037602.49 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 14:28:15,942 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3431006.6666666665, ans=0.0 2023-11-26 14:28:18,553 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.88 vs. limit=10.0 2023-11-26 14:28:32,839 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.99 vs. limit=15.0 2023-11-26 14:28:33,469 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3431073.3333333335, ans=0.125 2023-11-26 14:28:55,280 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3431206.6666666665, ans=0.125 2023-11-26 14:29:08,399 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 514700 2023-11-26 14:29:09,518 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3431273.3333333335, ans=0.1 2023-11-26 14:29:11,521 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 9700, loss[loss=0.07668, simple_loss=0.1099, pruned_loss=0.0146, audio_tagging_loss=0.007106, over 14207.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.08928, pruned_loss=0.01242, audio_tagging_loss=0.008863, over 3032792.85 frames. ], batch size: 52, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 14:29:14,974 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3431340.0, ans=0.0 2023-11-26 14:29:21,813 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.600e+01 8.878e+01 9.480e+01 1.018e+02 1.289e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-26 14:29:28,403 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3431406.6666666665, ans=0.0 2023-11-26 14:29:30,383 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.52 vs. limit=15.0 2023-11-26 14:29:39,130 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3431473.3333333335, ans=0.2 2023-11-26 14:29:40,448 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.86 vs. limit=10.0 2023-11-26 14:29:41,655 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.33 vs. limit=15.0 2023-11-26 14:29:44,698 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.89 vs. limit=6.0 2023-11-26 14:30:04,725 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 514750 2023-11-26 14:30:04,953 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3431606.6666666665, ans=0.125 2023-11-26 14:30:07,852 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 9750, loss[loss=0.05702, simple_loss=0.07132, pruned_loss=0.01202, audio_tagging_loss=0.009333, over 13796.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.08969, pruned_loss=0.01258, audio_tagging_loss=0.008648, over 3038226.26 frames. ], batch size: 53, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 14:30:10,257 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3431673.3333333335, ans=0.1 2023-11-26 14:30:19,983 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3431740.0, ans=0.0 2023-11-26 14:30:33,772 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3431806.6666666665, ans=0.1 2023-11-26 14:30:52,241 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.97 vs. limit=22.5 2023-11-26 14:31:01,237 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 514800 2023-11-26 14:31:03,774 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3432006.6666666665, ans=0.125 2023-11-26 14:31:04,621 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 9800, loss[loss=0.08505, simple_loss=0.1207, pruned_loss=0.01745, audio_tagging_loss=0.007244, over 14756.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.09023, pruned_loss=0.01261, audio_tagging_loss=0.008627, over 3036413.99 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 14:31:14,168 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.491e+01 8.979e+01 9.504e+01 1.025e+02 1.204e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-26 14:31:17,635 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3432073.3333333335, ans=0.125 2023-11-26 14:31:29,994 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.60 vs. limit=15.0 2023-11-26 14:31:40,013 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.09 vs. limit=10.0 2023-11-26 14:31:56,026 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 14:31:56,237 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3432273.3333333335, ans=0.0 2023-11-26 14:31:57,135 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 514850 2023-11-26 14:32:00,295 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 9850, loss[loss=0.07069, simple_loss=0.09816, pruned_loss=0.01438, audio_tagging_loss=0.007226, over 16720.00 frames. ], tot_loss[loss=0.06688, simple_loss=0.09124, pruned_loss=0.01273, audio_tagging_loss=0.008532, over 3040903.98 frames. ], batch size: 61, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 14:32:06,738 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.32 vs. limit=12.0 2023-11-26 14:32:13,291 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3432406.6666666665, ans=0.2 2023-11-26 14:32:22,404 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3432473.3333333335, ans=0.1 2023-11-26 14:32:22,467 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3432473.3333333335, ans=0.0 2023-11-26 14:32:36,678 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3432540.0, ans=0.05 2023-11-26 14:32:36,684 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3432540.0, ans=0.05 2023-11-26 14:32:42,028 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3432540.0, ans=0.125 2023-11-26 14:32:44,093 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3432606.6666666665, ans=0.125 2023-11-26 14:32:53,113 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 514900 2023-11-26 14:32:54,849 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3432606.6666666665, ans=0.1 2023-11-26 14:32:56,775 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 9900, loss[loss=0.06668, simple_loss=0.09687, pruned_loss=0.0127, audio_tagging_loss=0.005548, over 14912.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.09093, pruned_loss=0.01258, audio_tagging_loss=0.008505, over 3042481.43 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 14:33:07,498 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.992e+01 8.539e+01 9.208e+01 1.007e+02 1.176e+02, threshold=1.842e+02, percent-clipped=0.0 2023-11-26 14:33:50,433 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 514950 2023-11-26 14:33:53,510 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 9950, loss[loss=0.07291, simple_loss=0.09333, pruned_loss=0.01737, audio_tagging_loss=0.008874, over 14751.00 frames. ], tot_loss[loss=0.06646, simple_loss=0.09083, pruned_loss=0.01254, audio_tagging_loss=0.008503, over 3040465.71 frames. ], batch size: 57, lr: 1.56e-03, grad_scale: 32.0 2023-11-26 14:34:10,655 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3433073.3333333335, ans=0.125 2023-11-26 14:34:15,519 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3433140.0, ans=0.0 2023-11-26 14:34:18,650 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3433140.0, ans=0.025 2023-11-26 14:34:21,235 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3433140.0, ans=0.1 2023-11-26 14:34:29,309 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3433206.6666666665, ans=0.0 2023-11-26 14:34:45,648 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 515000 2023-11-26 14:34:49,117 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 10000, loss[loss=0.06089, simple_loss=0.0833, pruned_loss=0.01252, audio_tagging_loss=0.006725, over 14405.00 frames. ], tot_loss[loss=0.06596, simple_loss=0.08997, pruned_loss=0.01244, audio_tagging_loss=0.008535, over 3044269.60 frames. ], batch size: 56, lr: 1.56e-03, grad_scale: 32.0 2023-11-26 14:34:50,309 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3433340.0, ans=0.07 2023-11-26 14:34:59,089 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.754e+01 8.778e+01 9.390e+01 1.020e+02 2.265e+02, threshold=1.878e+02, percent-clipped=1.0 2023-11-26 14:35:10,793 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.82 vs. limit=15.0 2023-11-26 14:35:21,091 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3433473.3333333335, ans=0.0 2023-11-26 14:35:29,481 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3433540.0, ans=0.0 2023-11-26 14:35:38,108 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3433606.6666666665, ans=0.125 2023-11-26 14:35:41,094 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 515050 2023-11-26 14:35:45,417 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 10050, loss[loss=0.07658, simple_loss=0.1189, pruned_loss=0.01116, audio_tagging_loss=0.005956, over 15912.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.0903, pruned_loss=0.01255, audio_tagging_loss=0.008536, over 3043130.35 frames. ], batch size: 55, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:35:50,834 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3433673.3333333335, ans=0.125 2023-11-26 14:36:18,319 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3433873.3333333335, ans=0.015 2023-11-26 14:36:22,624 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3433873.3333333335, ans=0.125 2023-11-26 14:36:23,000 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.01 vs. limit=15.0 2023-11-26 14:36:33,888 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3433940.0, ans=0.1 2023-11-26 14:36:37,437 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 515100 2023-11-26 14:36:41,169 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 10100, loss[loss=0.06119, simple_loss=0.08142, pruned_loss=0.01107, audio_tagging_loss=0.009407, over 14551.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.08964, pruned_loss=0.01233, audio_tagging_loss=0.008565, over 3051414.56 frames. ], batch size: 56, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:36:51,887 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.484e+01 8.526e+01 9.238e+01 9.912e+01 1.286e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-26 14:37:00,683 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3434073.3333333335, ans=0.125 2023-11-26 14:37:03,881 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3434140.0, ans=0.125 2023-11-26 14:37:21,817 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.97 vs. limit=15.0 2023-11-26 14:37:24,757 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.19 vs. limit=22.5 2023-11-26 14:37:28,169 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 14:37:29,375 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3434273.3333333335, ans=0.125 2023-11-26 14:37:33,542 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 515150 2023-11-26 14:37:36,316 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.91 vs. limit=15.0 2023-11-26 14:37:36,673 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 10150, loss[loss=0.05823, simple_loss=0.06894, pruned_loss=0.01241, audio_tagging_loss=0.01135, over 15255.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.08983, pruned_loss=0.01223, audio_tagging_loss=0.008644, over 3049790.98 frames. ], batch size: 58, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:37:38,989 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3434340.0, ans=0.035 2023-11-26 14:37:40,187 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff3.min_abs, batch_count=3434340.0, ans=0.2 2023-11-26 14:37:42,192 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3434340.0, ans=0.1 2023-11-26 14:37:42,223 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3434340.0, ans=0.1 2023-11-26 14:37:42,439 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.17 vs. limit=15.0 2023-11-26 14:37:49,595 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3434406.6666666665, ans=0.1 2023-11-26 14:37:58,593 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.82 vs. limit=15.0 2023-11-26 14:38:02,842 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3434473.3333333335, ans=0.125 2023-11-26 14:38:05,428 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 14:38:08,790 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3434473.3333333335, ans=0.0 2023-11-26 14:38:10,883 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3434540.0, ans=0.125 2023-11-26 14:38:19,362 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3434540.0, ans=0.0 2023-11-26 14:38:23,657 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3434606.6666666665, ans=0.125 2023-11-26 14:38:28,846 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 515200 2023-11-26 14:38:32,193 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 10200, loss[loss=0.06951, simple_loss=0.1009, pruned_loss=0.01114, audio_tagging_loss=0.007914, over 16233.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08916, pruned_loss=0.01225, audio_tagging_loss=0.008862, over 3040935.60 frames. ], batch size: 59, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:38:42,251 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3434673.3333333335, ans=0.125 2023-11-26 14:38:44,741 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.947e+01 8.939e+01 9.563e+01 1.037e+02 1.347e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-26 14:38:45,021 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3434740.0, ans=0.1 2023-11-26 14:38:55,884 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 14:38:56,137 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3434806.6666666665, ans=0.125 2023-11-26 14:39:09,146 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.19 vs. limit=22.5 2023-11-26 14:39:09,958 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3434873.3333333335, ans=0.0 2023-11-26 14:39:17,882 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3434940.0, ans=0.125 2023-11-26 14:39:20,003 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3434940.0, ans=0.1 2023-11-26 14:39:21,115 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3434940.0, ans=0.025 2023-11-26 14:39:26,316 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 515250 2023-11-26 14:39:29,457 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 10250, loss[loss=0.06252, simple_loss=0.09308, pruned_loss=0.005386, audio_tagging_loss=0.0106, over 15332.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.08904, pruned_loss=0.01212, audio_tagging_loss=0.008907, over 3046173.23 frames. ], batch size: 55, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:39:35,776 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.83 vs. limit=15.0 2023-11-26 14:39:38,754 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3435006.6666666665, ans=0.0 2023-11-26 14:40:22,403 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 515300 2023-11-26 14:40:25,492 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 10300, loss[loss=0.062, simple_loss=0.08459, pruned_loss=0.01068, audio_tagging_loss=0.009027, over 15098.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.08972, pruned_loss=0.01233, audio_tagging_loss=0.008919, over 3047318.72 frames. ], batch size: 57, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:40:27,830 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3435340.0, ans=0.125 2023-11-26 14:40:36,204 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.928e+01 8.849e+01 9.518e+01 1.017e+02 1.480e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-26 14:40:57,706 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3435473.3333333335, ans=0.125 2023-11-26 14:41:16,108 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3435606.6666666665, ans=0.2 2023-11-26 14:41:18,048 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 515350 2023-11-26 14:41:21,157 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 10350, loss[loss=0.04918, simple_loss=0.0634, pruned_loss=0.007196, audio_tagging_loss=0.01028, over 13514.00 frames. ], tot_loss[loss=0.06667, simple_loss=0.09081, pruned_loss=0.01236, audio_tagging_loss=0.0089, over 3046059.15 frames. ], batch size: 53, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:41:29,307 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3435673.3333333335, ans=0.125 2023-11-26 14:41:41,035 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3435740.0, ans=0.125 2023-11-26 14:42:03,322 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3435873.3333333335, ans=0.0 2023-11-26 14:42:13,253 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 515400 2023-11-26 14:42:17,226 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 10400, loss[loss=0.07052, simple_loss=0.09331, pruned_loss=0.01195, audio_tagging_loss=0.01191, over 15836.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.09022, pruned_loss=0.012, audio_tagging_loss=0.00903, over 3056175.39 frames. ], batch size: 58, lr: 1.56e-03, grad_scale: 32.0 2023-11-26 14:42:29,076 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.250e+01 8.791e+01 9.464e+01 1.006e+02 1.363e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-26 14:42:48,344 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3436140.0, ans=0.125 2023-11-26 14:42:49,540 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3436206.6666666665, ans=0.125 2023-11-26 14:43:01,920 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3436273.3333333335, ans=0.2 2023-11-26 14:43:02,798 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3436273.3333333335, ans=0.125 2023-11-26 14:43:10,156 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 515450 2023-11-26 14:43:13,320 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 10450, loss[loss=0.05156, simple_loss=0.07155, pruned_loss=0.007483, audio_tagging_loss=0.008301, over 16122.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08914, pruned_loss=0.01186, audio_tagging_loss=0.009044, over 3054535.46 frames. ], batch size: 61, lr: 1.56e-03, grad_scale: 32.0 2023-11-26 14:43:24,174 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3436406.6666666665, ans=0.1 2023-11-26 14:43:27,224 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3436406.6666666665, ans=0.0 2023-11-26 14:43:28,430 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3436406.6666666665, ans=10.0 2023-11-26 14:43:35,548 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.71 vs. limit=15.0 2023-11-26 14:43:49,522 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3436540.0, ans=0.125 2023-11-26 14:43:59,112 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3436606.6666666665, ans=0.125 2023-11-26 14:44:05,421 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 515500 2023-11-26 14:44:08,667 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 10500, loss[loss=0.05516, simple_loss=0.07123, pruned_loss=0.009895, audio_tagging_loss=0.009654, over 16151.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.08849, pruned_loss=0.01196, audio_tagging_loss=0.008929, over 3046960.62 frames. ], batch size: 61, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:44:14,227 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3436673.3333333335, ans=0.1 2023-11-26 14:44:20,177 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3436740.0, ans=0.0 2023-11-26 14:44:20,953 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.233e+01 8.625e+01 9.527e+01 1.023e+02 1.211e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-26 14:44:31,415 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3436806.6666666665, ans=0.125 2023-11-26 14:45:01,575 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 515550 2023-11-26 14:45:01,707 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3436940.0, ans=0.125 2023-11-26 14:45:03,321 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.47 vs. limit=15.0 2023-11-26 14:45:04,717 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 10550, loss[loss=0.05911, simple_loss=0.07698, pruned_loss=0.01039, audio_tagging_loss=0.01024, over 14825.00 frames. ], tot_loss[loss=0.06471, simple_loss=0.08792, pruned_loss=0.01185, audio_tagging_loss=0.008897, over 3048929.70 frames. ], batch size: 55, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:45:06,455 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.78 vs. limit=15.0 2023-11-26 14:45:50,491 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3437273.3333333335, ans=0.07 2023-11-26 14:45:58,748 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 515600 2023-11-26 14:46:02,167 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 10600, loss[loss=0.05992, simple_loss=0.08047, pruned_loss=0.01208, audio_tagging_loss=0.0076, over 16727.00 frames. ], tot_loss[loss=0.06455, simple_loss=0.08776, pruned_loss=0.01189, audio_tagging_loss=0.008781, over 3044649.26 frames. ], batch size: 64, lr: 1.56e-03, grad_scale: 8.0 2023-11-26 14:46:08,022 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.99 vs. limit=15.0 2023-11-26 14:46:08,789 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3437340.0, ans=0.0 2023-11-26 14:46:14,991 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.016e+01 9.031e+01 9.613e+01 1.032e+02 1.237e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-26 14:46:30,670 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3437473.3333333335, ans=0.125 2023-11-26 14:46:42,878 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3437540.0, ans=0.0 2023-11-26 14:46:42,922 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3437540.0, ans=0.125 2023-11-26 14:46:54,443 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 515650 2023-11-26 14:46:57,569 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 10650, loss[loss=0.0441, simple_loss=0.05823, pruned_loss=0.006341, audio_tagging_loss=0.00864, over 13661.00 frames. ], tot_loss[loss=0.06492, simple_loss=0.08816, pruned_loss=0.0121, audio_tagging_loss=0.008732, over 3043426.51 frames. ], batch size: 55, lr: 1.56e-03, grad_scale: 8.0 2023-11-26 14:46:57,821 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 14:47:04,314 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3437673.3333333335, ans=0.0 2023-11-26 14:47:14,441 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3437740.0, ans=0.0 2023-11-26 14:47:28,066 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.36 vs. limit=22.5 2023-11-26 14:47:42,493 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3437940.0, ans=0.1 2023-11-26 14:47:50,317 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 515700 2023-11-26 14:47:53,414 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 10700, loss[loss=0.05052, simple_loss=0.07218, pruned_loss=0.004962, audio_tagging_loss=0.009465, over 14386.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.08906, pruned_loss=0.01231, audio_tagging_loss=0.008711, over 3041898.01 frames. ], batch size: 56, lr: 1.56e-03, grad_scale: 8.0 2023-11-26 14:47:59,453 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3438006.6666666665, ans=0.125 2023-11-26 14:48:05,355 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3438073.3333333335, ans=0.125 2023-11-26 14:48:06,628 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.16 vs. limit=15.0 2023-11-26 14:48:07,263 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.864e+01 8.817e+01 9.509e+01 1.036e+02 1.497e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-26 14:48:12,307 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3438073.3333333335, ans=0.125 2023-11-26 14:48:17,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3438140.0, ans=0.125 2023-11-26 14:48:21,296 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.30 vs. limit=10.0 2023-11-26 14:48:33,055 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 14:48:36,839 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.78 vs. limit=15.0 2023-11-26 14:48:46,189 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.94 vs. limit=22.5 2023-11-26 14:48:46,734 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 515750 2023-11-26 14:48:48,025 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 14:48:49,911 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 10750, loss[loss=0.06368, simple_loss=0.08932, pruned_loss=0.01154, audio_tagging_loss=0.007482, over 15904.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.08876, pruned_loss=0.01216, audio_tagging_loss=0.008668, over 3041645.50 frames. ], batch size: 59, lr: 1.56e-03, grad_scale: 8.0 2023-11-26 14:49:05,860 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.84 vs. limit=22.5 2023-11-26 14:49:17,227 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=3438473.3333333335, ans=15.0 2023-11-26 14:49:35,906 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3438606.6666666665, ans=0.1 2023-11-26 14:49:41,142 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.82 vs. limit=6.0 2023-11-26 14:49:41,768 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3438606.6666666665, ans=0.2 2023-11-26 14:49:42,678 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 515800 2023-11-26 14:49:46,060 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 10800, loss[loss=0.05523, simple_loss=0.07205, pruned_loss=0.007708, audio_tagging_loss=0.0115, over 17010.00 frames. ], tot_loss[loss=0.06472, simple_loss=0.08822, pruned_loss=0.0119, audio_tagging_loss=0.008711, over 3043175.01 frames. ], batch size: 66, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:49:52,640 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3438673.3333333335, ans=0.0 2023-11-26 14:49:53,876 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3438673.3333333335, ans=0.125 2023-11-26 14:49:59,618 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.532e+01 8.847e+01 9.608e+01 1.038e+02 1.531e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-26 14:50:07,018 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3438740.0, ans=0.2 2023-11-26 14:50:38,843 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 515850 2023-11-26 14:50:42,502 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 10850, loss[loss=0.05587, simple_loss=0.07994, pruned_loss=0.008544, audio_tagging_loss=0.00735, over 15540.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.08859, pruned_loss=0.01199, audio_tagging_loss=0.008685, over 3046934.51 frames. ], batch size: 56, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:50:55,386 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3439073.3333333335, ans=0.125 2023-11-26 14:51:00,776 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 14:51:05,012 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3439140.0, ans=0.0 2023-11-26 14:51:35,726 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 515900 2023-11-26 14:51:36,715 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 14:51:38,898 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 10900, loss[loss=0.06996, simple_loss=0.09902, pruned_loss=0.01226, audio_tagging_loss=0.008188, over 15682.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08888, pruned_loss=0.01206, audio_tagging_loss=0.008637, over 3049245.33 frames. ], batch size: 58, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:51:40,129 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3439340.0, ans=0.125 2023-11-26 14:51:40,210 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3439340.0, ans=0.1 2023-11-26 14:51:40,370 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.26 vs. limit=15.0 2023-11-26 14:51:52,168 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.363e+01 9.005e+01 9.638e+01 1.044e+02 1.421e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-26 14:51:56,791 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3439406.6666666665, ans=0.125 2023-11-26 14:52:13,188 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3439540.0, ans=0.125 2023-11-26 14:52:13,355 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3439540.0, ans=0.125 2023-11-26 14:52:21,850 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3439540.0, ans=0.0 2023-11-26 14:52:21,959 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3439540.0, ans=0.1 2023-11-26 14:52:28,166 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3439606.6666666665, ans=0.1 2023-11-26 14:52:31,167 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 515950 2023-11-26 14:52:34,980 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 10950, loss[loss=0.08498, simple_loss=0.1191, pruned_loss=0.01751, audio_tagging_loss=0.007912, over 15174.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.0889, pruned_loss=0.01201, audio_tagging_loss=0.008749, over 3049198.19 frames. ], batch size: 55, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:53:02,597 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3439806.6666666665, ans=0.125 2023-11-26 14:53:09,999 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3439873.3333333335, ans=0.0 2023-11-26 14:53:22,800 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.12 vs. limit=15.0 2023-11-26 14:53:27,465 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 516000 2023-11-26 14:53:28,841 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-516000.pt 2023-11-26 14:53:32,864 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 11000, loss[loss=0.06558, simple_loss=0.09438, pruned_loss=0.0103, audio_tagging_loss=0.008095, over 15788.00 frames. ], tot_loss[loss=0.06509, simple_loss=0.0885, pruned_loss=0.01195, audio_tagging_loss=0.008884, over 3052162.33 frames. ], batch size: 56, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:53:33,007 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3440006.6666666665, ans=0.125 2023-11-26 14:53:43,509 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 14:53:47,232 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.917e+01 8.791e+01 9.278e+01 1.005e+02 1.404e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-26 14:54:05,885 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3440206.6666666665, ans=0.125 2023-11-26 14:54:12,352 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3440206.6666666665, ans=0.125 2023-11-26 14:54:18,034 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_na.min_abs, batch_count=3440273.3333333335, ans=0.02 2023-11-26 14:54:25,794 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 516050 2023-11-26 14:54:29,438 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 11050, loss[loss=0.06933, simple_loss=0.1001, pruned_loss=0.01179, audio_tagging_loss=0.007473, over 16048.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.0893, pruned_loss=0.01206, audio_tagging_loss=0.008921, over 3049234.98 frames. ], batch size: 60, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:54:37,425 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.02 vs. limit=15.0 2023-11-26 14:54:49,390 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3440406.6666666665, ans=0.125 2023-11-26 14:54:53,877 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.39 vs. limit=22.5 2023-11-26 14:55:20,003 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.70 vs. limit=15.0 2023-11-26 14:55:20,666 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3440606.6666666665, ans=0.07 2023-11-26 14:55:20,760 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3440606.6666666665, ans=0.125 2023-11-26 14:55:21,590 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 516100 2023-11-26 14:55:21,813 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3440606.6666666665, ans=0.125 2023-11-26 14:55:24,801 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 11100, loss[loss=0.05903, simple_loss=0.07462, pruned_loss=0.01065, audio_tagging_loss=0.01107, over 15265.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.08996, pruned_loss=0.01225, audio_tagging_loss=0.009035, over 3047448.50 frames. ], batch size: 58, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:55:28,193 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3440673.3333333335, ans=0.07 2023-11-26 14:55:28,667 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=3440673.3333333335, ans=15.0 2023-11-26 14:55:38,071 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.080e+01 8.786e+01 9.291e+01 1.014e+02 1.274e+02, threshold=1.858e+02, percent-clipped=0.0 2023-11-26 14:55:48,628 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3440806.6666666665, ans=0.07 2023-11-26 14:55:54,260 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.53 vs. limit=6.0 2023-11-26 14:56:01,579 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.83 vs. limit=15.0 2023-11-26 14:56:09,724 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3440940.0, ans=0.125 2023-11-26 14:56:13,712 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3440940.0, ans=0.05 2023-11-26 14:56:17,662 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 516150 2023-11-26 14:56:20,720 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 11150, loss[loss=0.0764, simple_loss=0.1043, pruned_loss=0.01403, audio_tagging_loss=0.01021, over 14421.00 frames. ], tot_loss[loss=0.06663, simple_loss=0.09036, pruned_loss=0.01235, audio_tagging_loss=0.009102, over 3057516.10 frames. ], batch size: 54, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:56:36,459 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3441073.3333333335, ans=0.1 2023-11-26 14:56:47,111 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3441140.0, ans=0.07 2023-11-26 14:57:00,417 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3441206.6666666665, ans=0.1 2023-11-26 14:57:02,526 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=3441206.6666666665, ans=0.1 2023-11-26 14:57:13,038 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.74 vs. limit=15.0 2023-11-26 14:57:13,681 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 516200 2023-11-26 14:57:18,223 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 11200, loss[loss=0.0482, simple_loss=0.06805, pruned_loss=0.005621, audio_tagging_loss=0.008551, over 16333.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.08979, pruned_loss=0.01231, audio_tagging_loss=0.009202, over 3054059.76 frames. ], batch size: 63, lr: 1.56e-03, grad_scale: 32.0 2023-11-26 14:57:29,615 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.90 vs. limit=15.0 2023-11-26 14:57:30,983 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.675e+01 8.740e+01 9.384e+01 1.028e+02 1.331e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-26 14:57:49,771 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3441540.0, ans=0.125 2023-11-26 14:58:06,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3441606.6666666665, ans=0.125 2023-11-26 14:58:06,571 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.57 vs. limit=6.0 2023-11-26 14:58:10,357 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 516250 2023-11-26 14:58:11,912 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.21 vs. limit=15.0 2023-11-26 14:58:13,525 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 11250, loss[loss=0.06579, simple_loss=0.08657, pruned_loss=0.01179, audio_tagging_loss=0.01071, over 15673.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.08924, pruned_loss=0.01226, audio_tagging_loss=0.0092, over 3048916.66 frames. ], batch size: 57, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:58:14,860 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3441673.3333333335, ans=0.025 2023-11-26 14:58:35,304 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3441806.6666666665, ans=0.125 2023-11-26 14:58:51,293 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3441873.3333333335, ans=0.125 2023-11-26 14:59:01,901 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3441940.0, ans=0.125 2023-11-26 14:59:02,297 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.52 vs. limit=15.0 2023-11-26 14:59:05,994 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 516300 2023-11-26 14:59:09,213 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 11300, loss[loss=0.07041, simple_loss=0.09165, pruned_loss=0.01502, audio_tagging_loss=0.009561, over 16334.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.08903, pruned_loss=0.01231, audio_tagging_loss=0.00905, over 3049452.07 frames. ], batch size: 63, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:59:20,240 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3442073.3333333335, ans=0.125 2023-11-26 14:59:22,958 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3442073.3333333335, ans=0.125 2023-11-26 14:59:24,713 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.246e+01 8.684e+01 9.357e+01 1.017e+02 1.284e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-26 14:59:30,645 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3442073.3333333335, ans=0.0 2023-11-26 14:59:30,949 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.38 vs. limit=12.0 2023-11-26 14:59:47,511 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3442206.6666666665, ans=0.125 2023-11-26 15:00:00,505 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.94 vs. limit=15.0 2023-11-26 15:00:02,085 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 516350 2023-11-26 15:00:05,783 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 11350, loss[loss=0.06667, simple_loss=0.08213, pruned_loss=0.01461, audio_tagging_loss=0.011, over 15327.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08935, pruned_loss=0.01228, audio_tagging_loss=0.008902, over 3040852.51 frames. ], batch size: 58, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 15:00:08,206 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3442340.0, ans=0.2 2023-11-26 15:00:36,708 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.12 vs. limit=6.0 2023-11-26 15:00:58,406 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 516400 2023-11-26 15:01:01,799 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 11400, loss[loss=0.07498, simple_loss=0.1054, pruned_loss=0.01596, audio_tagging_loss=0.006313, over 14373.00 frames. ], tot_loss[loss=0.06684, simple_loss=0.091, pruned_loss=0.0126, audio_tagging_loss=0.008745, over 3039146.11 frames. ], batch size: 57, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 15:01:15,667 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.526e+01 8.777e+01 9.213e+01 1.005e+02 1.277e+02, threshold=1.843e+02, percent-clipped=0.0 2023-11-26 15:01:38,095 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3442873.3333333335, ans=0.125 2023-11-26 15:01:44,558 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3442873.3333333335, ans=0.0 2023-11-26 15:01:53,930 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 516450 2023-11-26 15:01:57,088 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 11450, loss[loss=0.05887, simple_loss=0.08211, pruned_loss=0.01172, audio_tagging_loss=0.006095, over 14696.00 frames. ], tot_loss[loss=0.06645, simple_loss=0.09045, pruned_loss=0.01246, audio_tagging_loss=0.008769, over 3031858.27 frames. ], batch size: 57, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 15:02:01,007 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.33 vs. limit=10.0 2023-11-26 15:02:15,074 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3443073.3333333335, ans=0.125 2023-11-26 15:02:33,626 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3443206.6666666665, ans=0.125 2023-11-26 15:02:39,948 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3443206.6666666665, ans=0.125 2023-11-26 15:02:49,878 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 516500 2023-11-26 15:02:53,541 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 11500, loss[loss=0.06625, simple_loss=0.09349, pruned_loss=0.01413, audio_tagging_loss=0.00538, over 14543.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.09053, pruned_loss=0.01243, audio_tagging_loss=0.008733, over 3037934.61 frames. ], batch size: 53, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 15:02:54,832 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3443340.0, ans=0.125 2023-11-26 15:02:58,392 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3443340.0, ans=0.1 2023-11-26 15:03:08,392 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.464e+01 8.894e+01 9.338e+01 1.016e+02 1.234e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-26 15:03:09,772 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3443406.6666666665, ans=0.125 2023-11-26 15:03:09,985 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.91 vs. limit=6.0 2023-11-26 15:03:33,583 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3443540.0, ans=0.2 2023-11-26 15:03:35,334 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.31 vs. limit=15.0 2023-11-26 15:03:45,902 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3443606.6666666665, ans=0.0 2023-11-26 15:03:46,763 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 516550 2023-11-26 15:03:49,910 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 11550, loss[loss=0.06681, simple_loss=0.09308, pruned_loss=0.01127, audio_tagging_loss=0.009, over 16275.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.09056, pruned_loss=0.0125, audio_tagging_loss=0.008801, over 3047594.68 frames. ], batch size: 61, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 15:03:52,162 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3443673.3333333335, ans=0.1 2023-11-26 15:04:07,401 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.36 vs. limit=22.5 2023-11-26 15:04:25,131 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 15:04:26,613 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.84 vs. limit=15.0 2023-11-26 15:04:42,161 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 516600 2023-11-26 15:04:43,776 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3443940.0, ans=0.0 2023-11-26 15:04:44,199 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.76 vs. limit=22.5 2023-11-26 15:04:45,629 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 11600, loss[loss=0.06682, simple_loss=0.08595, pruned_loss=0.01255, audio_tagging_loss=0.0113, over 15262.00 frames. ], tot_loss[loss=0.06712, simple_loss=0.09154, pruned_loss=0.01268, audio_tagging_loss=0.008666, over 3046012.61 frames. ], batch size: 57, lr: 1.56e-03, grad_scale: 32.0 2023-11-26 15:04:48,955 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3444006.6666666665, ans=0.0 2023-11-26 15:05:00,073 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.205e+01 8.905e+01 9.507e+01 1.006e+02 1.398e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-26 15:05:15,556 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3444140.0, ans=0.125 2023-11-26 15:05:26,336 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3444206.6666666665, ans=0.09899494936611666 2023-11-26 15:05:28,008 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.27 vs. limit=15.0 2023-11-26 15:05:37,845 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 516650 2023-11-26 15:05:41,438 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 11650, loss[loss=0.07316, simple_loss=0.09931, pruned_loss=0.01442, audio_tagging_loss=0.009092, over 15371.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.09067, pruned_loss=0.01251, audio_tagging_loss=0.008764, over 3039409.38 frames. ], batch size: 59, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 15:05:47,810 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.71 vs. limit=15.0 2023-11-26 15:05:52,671 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.43 vs. limit=6.0 2023-11-26 15:05:54,389 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3444406.6666666665, ans=0.125 2023-11-26 15:05:57,511 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3444406.6666666665, ans=0.125 2023-11-26 15:06:34,846 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 516700 2023-11-26 15:06:37,096 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3444673.3333333335, ans=0.125 2023-11-26 15:06:37,975 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 11700, loss[loss=0.06164, simple_loss=0.08, pruned_loss=0.01306, audio_tagging_loss=0.008576, over 14770.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.08995, pruned_loss=0.01227, audio_tagging_loss=0.008781, over 3042695.08 frames. ], batch size: 57, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 15:06:52,883 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.614e+01 8.754e+01 9.292e+01 9.879e+01 2.063e+02, threshold=1.858e+02, percent-clipped=1.0 2023-11-26 15:07:10,375 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3444873.3333333335, ans=0.0 2023-11-26 15:07:23,228 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 15:07:29,507 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 516750 2023-11-26 15:07:32,641 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 11750, loss[loss=0.0546, simple_loss=0.07075, pruned_loss=0.008236, audio_tagging_loss=0.01099, over 15395.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.091, pruned_loss=0.01223, audio_tagging_loss=0.008753, over 3050282.44 frames. ], batch size: 58, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 15:07:43,595 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3445073.3333333335, ans=0.125 2023-11-26 15:07:49,500 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3445073.3333333335, ans=0.0 2023-11-26 15:08:01,832 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3445140.0, ans=0.125 2023-11-26 15:08:04,958 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3445140.0, ans=0.0 2023-11-26 15:08:06,052 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3445206.6666666665, ans=0.125 2023-11-26 15:08:20,904 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3445273.3333333335, ans=0.1 2023-11-26 15:08:23,616 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.75 vs. limit=8.0 2023-11-26 15:08:24,954 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 516800 2023-11-26 15:08:28,238 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 11800, loss[loss=0.06884, simple_loss=0.08716, pruned_loss=0.0147, audio_tagging_loss=0.01056, over 15368.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.09034, pruned_loss=0.01226, audio_tagging_loss=0.008807, over 3035136.07 frames. ], batch size: 55, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 15:08:32,193 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3445340.0, ans=0.125 2023-11-26 15:08:34,304 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3445340.0, ans=0.125 2023-11-26 15:08:45,371 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.466e+01 8.777e+01 9.316e+01 1.001e+02 1.366e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-26 15:08:51,229 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.07 vs. limit=22.5 2023-11-26 15:08:55,300 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3445473.3333333335, ans=0.2 2023-11-26 15:09:08,275 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.94 vs. limit=6.0 2023-11-26 15:09:17,012 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3445606.6666666665, ans=0.0 2023-11-26 15:09:22,274 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 516850 2023-11-26 15:09:25,434 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 11850, loss[loss=0.06012, simple_loss=0.08099, pruned_loss=0.01126, audio_tagging_loss=0.008362, over 14756.00 frames. ], tot_loss[loss=0.0664, simple_loss=0.09066, pruned_loss=0.01219, audio_tagging_loss=0.008877, over 3035582.24 frames. ], batch size: 57, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 15:09:36,784 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3445740.0, ans=0.125 2023-11-26 15:09:43,074 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3445740.0, ans=0.125 2023-11-26 15:09:47,717 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.59 vs. limit=15.0 2023-11-26 15:09:55,330 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3445806.6666666665, ans=0.125 2023-11-26 15:09:55,338 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3445806.6666666665, ans=0.1 2023-11-26 15:10:10,795 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 15:10:17,984 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 516900 2023-11-26 15:10:21,061 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 11900, loss[loss=0.05345, simple_loss=0.07261, pruned_loss=0.007976, audio_tagging_loss=0.009172, over 15513.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.08995, pruned_loss=0.01212, audio_tagging_loss=0.008935, over 3037576.91 frames. ], batch size: 59, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 15:10:31,032 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.22 vs. limit=22.5 2023-11-26 15:10:35,761 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.403e+01 8.570e+01 9.365e+01 9.968e+01 1.257e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-26 15:10:49,345 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3446140.0, ans=0.125 2023-11-26 15:11:13,056 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 516950 2023-11-26 15:11:14,256 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3446273.3333333335, ans=0.2 2023-11-26 15:11:16,148 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 11950, loss[loss=0.06305, simple_loss=0.07919, pruned_loss=0.01255, audio_tagging_loss=0.0109, over 14745.00 frames. ], tot_loss[loss=0.06662, simple_loss=0.0906, pruned_loss=0.01243, audio_tagging_loss=0.008893, over 3041958.21 frames. ], batch size: 56, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 15:11:32,910 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3446406.6666666665, ans=0.0 2023-11-26 15:11:43,181 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3446473.3333333335, ans=0.0 2023-11-26 15:11:51,453 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3446540.0, ans=0.2 2023-11-26 15:11:51,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3446540.0, ans=0.125 2023-11-26 15:11:58,465 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3446540.0, ans=0.125 2023-11-26 15:12:07,686 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 517000 2023-11-26 15:12:07,760 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3446606.6666666665, ans=0.1 2023-11-26 15:12:09,170 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3446606.6666666665, ans=0.125 2023-11-26 15:12:11,025 INFO [train_asr.py:1235] (0/4) Epoch 43, batch 12000, loss[loss=0.05866, simple_loss=0.08385, pruned_loss=0.008439, audio_tagging_loss=0.008303, over 14733.00 frames. ], tot_loss[loss=0.06696, simple_loss=0.09097, pruned_loss=0.0126, audio_tagging_loss=0.008878, over 3042896.24 frames. ], batch size: 54, lr: 1.56e-03, grad_scale: 32.0 2023-11-26 15:12:11,027 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-26 15:12:38,772 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.4608, 3.8940, 4.3895, 3.5617], device='cuda:0') 2023-11-26 15:12:43,908 INFO [train_asr.py:1267] (0/4) Epoch 43, validation: loss=0.05829, simple_loss=0.05056, pruned_loss=0.00528, audio_tagging_loss=0.02773, over 4681554.00 frames. 2023-11-26 15:12:43,909 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-26 15:12:45,258 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 15:12:58,687 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.129e+01 8.906e+01 9.562e+01 1.016e+02 1.213e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-26 15:13:08,168 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3446806.6666666665, ans=0.0 2023-11-26 15:13:11,072 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/epoch-43.pt 2023-11-26 15:13:38,089 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 0, loss[loss=0.08209, simple_loss=0.09616, pruned_loss=0.01193, audio_tagging_loss=0.02208, over 15145.00 frames. ], tot_loss[loss=0.08209, simple_loss=0.09616, pruned_loss=0.01193, audio_tagging_loss=0.02208, over 15145.00 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 15:13:38,091 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-26 15:14:01,418 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.3237, 4.2926, 4.4918, 4.4793], device='cuda:0') 2023-11-26 15:14:08,285 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.9727, 3.9666, 4.8625, 4.5211], device='cuda:0') 2023-11-26 15:14:09,403 INFO [train_asr.py:1267] (0/4) Epoch 44, validation: loss=0.05821, simple_loss=0.05063, pruned_loss=0.005319, audio_tagging_loss=0.02758, over 4681554.00 frames. 2023-11-26 15:14:09,403 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-26 15:14:14,818 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3446840.0, ans=0.125 2023-11-26 15:14:34,449 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 517050 2023-11-26 15:14:34,633 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3446973.3333333335, ans=0.125 2023-11-26 15:14:51,955 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3447040.0, ans=0.0 2023-11-26 15:14:57,357 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3447106.6666666665, ans=0.125 2023-11-26 15:14:58,453 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3447106.6666666665, ans=0.125 2023-11-26 15:15:05,080 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 50, loss[loss=0.06639, simple_loss=0.0871, pruned_loss=0.009231, audio_tagging_loss=0.01361, over 14502.00 frames. ], tot_loss[loss=0.07497, simple_loss=0.09277, pruned_loss=0.01228, audio_tagging_loss=0.0163, over 686110.90 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 15:15:05,284 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3447173.3333333335, ans=0.125 2023-11-26 15:15:17,873 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3447240.0, ans=0.125 2023-11-26 15:15:23,460 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.99 vs. limit=15.0 2023-11-26 15:15:29,200 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3447306.6666666665, ans=0.125 2023-11-26 15:15:30,256 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 517100 2023-11-26 15:15:32,446 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3447306.6666666665, ans=0.125 2023-11-26 15:15:39,954 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3447373.3333333335, ans=0.2 2023-11-26 15:15:47,916 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3447373.3333333335, ans=0.0 2023-11-26 15:15:47,934 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3447373.3333333335, ans=0.0 2023-11-26 15:15:48,797 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.397e+01 9.647e+01 1.037e+02 1.149e+02 1.439e+02, threshold=2.073e+02, percent-clipped=0.0 2023-11-26 15:15:54,289 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.77 vs. limit=12.0 2023-11-26 15:15:57,728 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3447440.0, ans=0.0 2023-11-26 15:16:01,677 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 100, loss[loss=0.04725, simple_loss=0.04978, pruned_loss=0.005351, audio_tagging_loss=0.01701, over 14981.00 frames. ], tot_loss[loss=0.073, simple_loss=0.09036, pruned_loss=0.01199, audio_tagging_loss=0.01584, over 1212282.56 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:16:04,075 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3447506.6666666665, ans=0.125 2023-11-26 15:16:08,165 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.49 vs. limit=10.0 2023-11-26 15:16:12,048 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3447573.3333333335, ans=0.0 2023-11-26 15:16:18,948 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.89 vs. limit=15.0 2023-11-26 15:16:22,910 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3447640.0, ans=0.1 2023-11-26 15:16:26,514 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 517150 2023-11-26 15:16:30,995 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3447640.0, ans=0.125 2023-11-26 15:16:32,403 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.12 vs. limit=15.0 2023-11-26 15:16:33,129 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3447640.0, ans=0.125 2023-11-26 15:16:45,097 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3447706.6666666665, ans=0.2 2023-11-26 15:16:45,626 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.77 vs. limit=12.0 2023-11-26 15:16:57,617 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3447840.0, ans=0.125 2023-11-26 15:16:58,403 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 150, loss[loss=0.06759, simple_loss=0.09076, pruned_loss=0.01269, audio_tagging_loss=0.009521, over 14889.00 frames. ], tot_loss[loss=0.07128, simple_loss=0.08958, pruned_loss=0.01214, audio_tagging_loss=0.01435, over 1623112.61 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:17:10,308 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=3447906.6666666665, ans=6.0 2023-11-26 15:17:10,906 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3447906.6666666665, ans=0.125 2023-11-26 15:17:11,026 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3447906.6666666665, ans=0.125 2023-11-26 15:17:23,542 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 517200 2023-11-26 15:17:24,157 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.92 vs. limit=15.0 2023-11-26 15:17:26,211 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3447973.3333333335, ans=0.0 2023-11-26 15:17:38,298 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3448040.0, ans=0.125 2023-11-26 15:17:43,503 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.508e+01 9.130e+01 9.675e+01 1.049e+02 1.216e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-26 15:17:47,913 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3448106.6666666665, ans=0.125 2023-11-26 15:17:54,481 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 200, loss[loss=0.05244, simple_loss=0.06416, pruned_loss=0.009178, audio_tagging_loss=0.01118, over 15136.00 frames. ], tot_loss[loss=0.07028, simple_loss=0.09043, pruned_loss=0.01236, audio_tagging_loss=0.0127, over 1935519.47 frames. ], batch size: 61, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:18:19,035 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 517250 2023-11-26 15:18:36,504 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3448373.3333333335, ans=0.1 2023-11-26 15:18:41,799 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3448440.0, ans=0.1 2023-11-26 15:18:51,359 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 250, loss[loss=0.06912, simple_loss=0.09931, pruned_loss=0.01273, audio_tagging_loss=0.006732, over 15422.00 frames. ], tot_loss[loss=0.06893, simple_loss=0.09021, pruned_loss=0.01228, audio_tagging_loss=0.01155, over 2181376.47 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:19:03,294 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3448573.3333333335, ans=0.125 2023-11-26 15:19:13,490 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3448640.0, ans=0.2 2023-11-26 15:19:15,426 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 517300 2023-11-26 15:19:36,416 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.235e+01 8.917e+01 9.750e+01 1.047e+02 1.492e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-26 15:19:43,424 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.46 vs. limit=22.5 2023-11-26 15:19:46,155 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3448840.0, ans=0.0 2023-11-26 15:19:46,943 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 300, loss[loss=0.0543, simple_loss=0.0685, pruned_loss=0.00924, audio_tagging_loss=0.01081, over 15270.00 frames. ], tot_loss[loss=0.06859, simple_loss=0.0911, pruned_loss=0.01237, audio_tagging_loss=0.01066, over 2378216.98 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:19:49,226 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=3448840.0, ans=0.025 2023-11-26 15:19:52,094 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3448840.0, ans=0.1 2023-11-26 15:19:56,747 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.89 vs. limit=22.5 2023-11-26 15:20:08,918 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.61 vs. limit=15.0 2023-11-26 15:20:12,217 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 517350 2023-11-26 15:20:26,170 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.32 vs. limit=10.0 2023-11-26 15:20:43,514 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 350, loss[loss=0.06724, simple_loss=0.0913, pruned_loss=0.01263, audio_tagging_loss=0.008954, over 15447.00 frames. ], tot_loss[loss=0.06747, simple_loss=0.09052, pruned_loss=0.01217, audio_tagging_loss=0.01004, over 2531975.24 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:20:43,823 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3449173.3333333335, ans=0.125 2023-11-26 15:20:59,512 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3449240.0, ans=0.0 2023-11-26 15:21:07,956 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 517400 2023-11-26 15:21:28,360 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.480e+01 9.037e+01 9.510e+01 1.047e+02 1.188e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-26 15:21:37,713 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3449440.0, ans=0.1 2023-11-26 15:21:40,187 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 400, loss[loss=0.05189, simple_loss=0.06273, pruned_loss=0.00857, audio_tagging_loss=0.01195, over 14289.00 frames. ], tot_loss[loss=0.06776, simple_loss=0.09091, pruned_loss=0.01247, audio_tagging_loss=0.009829, over 2649092.55 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 15:21:53,047 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3449573.3333333335, ans=0.125 2023-11-26 15:22:02,442 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.18 vs. limit=6.0 2023-11-26 15:22:03,999 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 517450 2023-11-26 15:22:24,843 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3449773.3333333335, ans=0.125 2023-11-26 15:22:25,985 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3449773.3333333335, ans=0.0 2023-11-26 15:22:29,089 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3449773.3333333335, ans=0.0 2023-11-26 15:22:35,245 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 450, loss[loss=0.05341, simple_loss=0.07736, pruned_loss=0.008243, audio_tagging_loss=0.006493, over 15310.00 frames. ], tot_loss[loss=0.06717, simple_loss=0.09052, pruned_loss=0.01243, audio_tagging_loss=0.009483, over 2735444.03 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 15:22:35,417 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3449840.0, ans=0.125 2023-11-26 15:22:47,046 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3449906.6666666665, ans=0.125 2023-11-26 15:23:00,165 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 517500 2023-11-26 15:23:07,722 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3449973.3333333335, ans=0.125 2023-11-26 15:23:20,514 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.561e+01 8.956e+01 9.480e+01 1.000e+02 1.239e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-26 15:23:31,846 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 500, loss[loss=0.06737, simple_loss=0.0871, pruned_loss=0.01463, audio_tagging_loss=0.009195, over 15106.00 frames. ], tot_loss[loss=0.06752, simple_loss=0.09087, pruned_loss=0.01279, audio_tagging_loss=0.009289, over 2806718.37 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:23:32,024 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3450173.3333333335, ans=0.0 2023-11-26 15:23:49,650 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3450240.0, ans=0.125 2023-11-26 15:23:57,011 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 517550 2023-11-26 15:24:02,424 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3450306.6666666665, ans=0.1 2023-11-26 15:24:05,709 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3450373.3333333335, ans=0.0 2023-11-26 15:24:09,882 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3450373.3333333335, ans=0.95 2023-11-26 15:24:14,983 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.09 vs. limit=12.0 2023-11-26 15:24:28,926 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 550, loss[loss=0.07716, simple_loss=0.1037, pruned_loss=0.01662, audio_tagging_loss=0.008657, over 15086.00 frames. ], tot_loss[loss=0.06682, simple_loss=0.09032, pruned_loss=0.01253, audio_tagging_loss=0.009133, over 2848657.15 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:24:29,253 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3450506.6666666665, ans=0.125 2023-11-26 15:24:39,787 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3450573.3333333335, ans=0.125 2023-11-26 15:24:52,328 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 517600 2023-11-26 15:25:11,615 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3450706.6666666665, ans=0.0 2023-11-26 15:25:14,501 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.620e+01 8.780e+01 9.554e+01 1.038e+02 1.321e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-26 15:25:24,060 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 600, loss[loss=0.07563, simple_loss=0.1086, pruned_loss=0.01464, audio_tagging_loss=0.00667, over 14552.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.08933, pruned_loss=0.01239, audio_tagging_loss=0.009155, over 2894518.76 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:25:27,498 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3450840.0, ans=0.125 2023-11-26 15:25:35,946 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3450906.6666666665, ans=0.1 2023-11-26 15:25:37,074 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3450906.6666666665, ans=0.0 2023-11-26 15:25:48,493 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 517650 2023-11-26 15:26:01,767 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.39 vs. limit=15.0 2023-11-26 15:26:15,164 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3451106.6666666665, ans=0.2 2023-11-26 15:26:16,218 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3451106.6666666665, ans=0.125 2023-11-26 15:26:19,362 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 650, loss[loss=0.0729, simple_loss=0.09802, pruned_loss=0.01527, audio_tagging_loss=0.008629, over 13880.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.08961, pruned_loss=0.01247, audio_tagging_loss=0.009016, over 2929660.48 frames. ], batch size: 53, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:26:22,644 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3451173.3333333335, ans=0.035 2023-11-26 15:26:28,089 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3451173.3333333335, ans=0.2 2023-11-26 15:26:33,830 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.34 vs. limit=15.0 2023-11-26 15:26:45,075 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 517700 2023-11-26 15:26:49,757 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=3451306.6666666665, ans=15.0 2023-11-26 15:27:04,411 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3451440.0, ans=0.125 2023-11-26 15:27:06,894 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.760e+01 8.698e+01 9.351e+01 9.946e+01 1.223e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-26 15:27:11,961 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3451440.0, ans=0.125 2023-11-26 15:27:15,789 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 700, loss[loss=0.0554, simple_loss=0.0698, pruned_loss=0.01026, audio_tagging_loss=0.01025, over 15203.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08906, pruned_loss=0.01229, audio_tagging_loss=0.009049, over 2960801.48 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 8.0 2023-11-26 15:27:27,746 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3451573.3333333335, ans=0.125 2023-11-26 15:27:40,477 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 517750 2023-11-26 15:27:58,831 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3451706.6666666665, ans=0.125 2023-11-26 15:28:03,095 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3451773.3333333335, ans=0.125 2023-11-26 15:28:12,466 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 750, loss[loss=0.04976, simple_loss=0.06716, pruned_loss=0.005634, audio_tagging_loss=0.01054, over 16674.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.08971, pruned_loss=0.01248, audio_tagging_loss=0.009013, over 2988810.19 frames. ], batch size: 64, lr: 1.54e-03, grad_scale: 8.0 2023-11-26 15:28:35,759 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3451973.3333333335, ans=0.2 2023-11-26 15:28:36,631 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 517800 2023-11-26 15:28:45,214 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3451973.3333333335, ans=0.1 2023-11-26 15:28:59,978 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.252e+01 8.944e+01 9.681e+01 1.076e+02 1.736e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-26 15:29:05,446 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3452106.6666666665, ans=0.125 2023-11-26 15:29:07,552 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3452173.3333333335, ans=0.125 2023-11-26 15:29:08,459 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 800, loss[loss=0.07583, simple_loss=0.1077, pruned_loss=0.01552, audio_tagging_loss=0.006438, over 14956.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.08987, pruned_loss=0.01259, audio_tagging_loss=0.009002, over 3001736.22 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:29:17,741 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3452173.3333333335, ans=0.1 2023-11-26 15:29:19,162 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.69 vs. limit=6.0 2023-11-26 15:29:19,827 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3452240.0, ans=0.0 2023-11-26 15:29:28,765 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3452240.0, ans=0.125 2023-11-26 15:29:29,834 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3452240.0, ans=0.125 2023-11-26 15:29:34,029 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 517850 2023-11-26 15:29:37,486 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3452306.6666666665, ans=0.2 2023-11-26 15:29:41,700 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3452373.3333333335, ans=0.2 2023-11-26 15:29:58,162 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.82 vs. limit=15.0 2023-11-26 15:30:03,252 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3452506.6666666665, ans=0.2 2023-11-26 15:30:04,063 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 850, loss[loss=0.09667, simple_loss=0.1348, pruned_loss=0.02437, audio_tagging_loss=0.004879, over 15228.00 frames. ], tot_loss[loss=0.06784, simple_loss=0.09152, pruned_loss=0.01303, audio_tagging_loss=0.009045, over 3016814.81 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:30:16,765 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3452573.3333333335, ans=0.1 2023-11-26 15:30:16,984 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.03 vs. limit=15.0 2023-11-26 15:30:18,086 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.68 vs. limit=6.0 2023-11-26 15:30:26,135 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3452640.0, ans=0.125 2023-11-26 15:30:29,241 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 517900 2023-11-26 15:30:52,037 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.628e+01 8.827e+01 9.589e+01 1.017e+02 1.364e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-26 15:30:55,549 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3452773.3333333335, ans=0.125 2023-11-26 15:31:00,599 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 900, loss[loss=0.07152, simple_loss=0.1035, pruned_loss=0.01135, audio_tagging_loss=0.008423, over 14727.00 frames. ], tot_loss[loss=0.06779, simple_loss=0.09178, pruned_loss=0.01283, audio_tagging_loss=0.009071, over 3026766.85 frames. ], batch size: 53, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:31:24,583 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 517950 2023-11-26 15:31:29,275 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.58 vs. limit=15.0 2023-11-26 15:31:30,046 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3452973.3333333335, ans=0.125 2023-11-26 15:31:41,149 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3453040.0, ans=10.0 2023-11-26 15:31:47,901 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.66 vs. limit=22.5 2023-11-26 15:31:54,757 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 950, loss[loss=0.0618, simple_loss=0.07926, pruned_loss=0.0128, audio_tagging_loss=0.009363, over 14691.00 frames. ], tot_loss[loss=0.06695, simple_loss=0.09068, pruned_loss=0.01261, audio_tagging_loss=0.009006, over 3030042.14 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:31:55,983 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3453173.3333333335, ans=0.125 2023-11-26 15:32:01,175 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3453173.3333333335, ans=0.0 2023-11-26 15:32:10,385 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3453240.0, ans=0.0 2023-11-26 15:32:20,101 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 518000 2023-11-26 15:32:28,353 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.79 vs. limit=12.0 2023-11-26 15:32:30,217 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3453373.3333333335, ans=0.2 2023-11-26 15:32:31,189 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3453373.3333333335, ans=0.1 2023-11-26 15:32:41,772 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.555e+01 8.765e+01 9.471e+01 9.957e+01 1.208e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-26 15:32:43,033 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3453440.0, ans=0.125 2023-11-26 15:32:50,866 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 1000, loss[loss=0.07748, simple_loss=0.1097, pruned_loss=0.01498, audio_tagging_loss=0.007636, over 15937.00 frames. ], tot_loss[loss=0.06668, simple_loss=0.09066, pruned_loss=0.01257, audio_tagging_loss=0.00879, over 3034345.40 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:32:58,055 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_na.min_abs, batch_count=3453506.6666666665, ans=0.02 2023-11-26 15:33:07,031 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3453573.3333333335, ans=0.0 2023-11-26 15:33:13,343 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3453640.0, ans=0.0 2023-11-26 15:33:14,241 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 15:33:15,319 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 518050 2023-11-26 15:33:27,860 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.72 vs. limit=15.0 2023-11-26 15:33:35,309 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3453773.3333333335, ans=0.125 2023-11-26 15:33:37,528 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3453773.3333333335, ans=0.125 2023-11-26 15:33:46,654 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2023-11-26 15:33:46,940 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 1050, loss[loss=0.05046, simple_loss=0.0732, pruned_loss=0.006254, audio_tagging_loss=0.007609, over 16035.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.08982, pruned_loss=0.01242, audio_tagging_loss=0.008623, over 3025806.92 frames. ], batch size: 60, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:33:50,414 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3453840.0, ans=0.0 2023-11-26 15:33:57,864 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3453906.6666666665, ans=0.125 2023-11-26 15:34:01,301 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.04 vs. limit=15.0 2023-11-26 15:34:11,135 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 518100 2023-11-26 15:34:24,812 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.40 vs. limit=10.0 2023-11-26 15:34:33,433 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.459e+01 8.861e+01 9.465e+01 1.011e+02 1.415e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-26 15:34:42,001 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 1100, loss[loss=0.0471, simple_loss=0.06205, pruned_loss=0.006238, audio_tagging_loss=0.009843, over 14127.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08964, pruned_loss=0.01247, audio_tagging_loss=0.008566, over 3025632.27 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:34:45,231 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 15:35:04,588 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3454306.6666666665, ans=0.0 2023-11-26 15:35:06,596 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 518150 2023-11-26 15:35:11,974 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.23 vs. limit=15.0 2023-11-26 15:35:14,667 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3454373.3333333335, ans=0.05 2023-11-26 15:35:31,626 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3454440.0, ans=0.125 2023-11-26 15:35:32,737 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3454440.0, ans=0.1 2023-11-26 15:35:37,333 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 1150, loss[loss=0.06148, simple_loss=0.08018, pruned_loss=0.0128, audio_tagging_loss=0.008591, over 14675.00 frames. ], tot_loss[loss=0.06505, simple_loss=0.08836, pruned_loss=0.0122, audio_tagging_loss=0.008668, over 3030988.01 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 8.0 2023-11-26 15:35:49,663 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3454573.3333333335, ans=0.0 2023-11-26 15:36:01,833 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 518200 2023-11-26 15:36:03,882 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.03 vs. limit=15.0 2023-11-26 15:36:13,922 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3454706.6666666665, ans=0.0 2023-11-26 15:36:17,637 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3454706.6666666665, ans=0.125 2023-11-26 15:36:20,928 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=3454773.3333333335, ans=0.1 2023-11-26 15:36:24,926 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.587e+01 8.766e+01 9.295e+01 9.999e+01 1.209e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-26 15:36:32,015 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3454840.0, ans=0.125 2023-11-26 15:36:32,879 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 1200, loss[loss=0.054, simple_loss=0.07496, pruned_loss=0.007816, audio_tagging_loss=0.008706, over 15904.00 frames. ], tot_loss[loss=0.0652, simple_loss=0.08866, pruned_loss=0.01221, audio_tagging_loss=0.008659, over 3038597.14 frames. ], batch size: 60, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:36:32,948 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3454840.0, ans=0.015 2023-11-26 15:36:57,062 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 518250 2023-11-26 15:37:05,475 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.83 vs. limit=15.0 2023-11-26 15:37:07,287 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3455040.0, ans=0.95 2023-11-26 15:37:09,415 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3455040.0, ans=0.125 2023-11-26 15:37:28,274 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 1250, loss[loss=0.06403, simple_loss=0.08593, pruned_loss=0.01517, audio_tagging_loss=0.005904, over 16218.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08896, pruned_loss=0.01223, audio_tagging_loss=0.008646, over 3041855.05 frames. ], batch size: 60, lr: 1.54e-03, grad_scale: 8.0 2023-11-26 15:37:30,148 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.92 vs. limit=15.0 2023-11-26 15:37:52,854 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 518300 2023-11-26 15:38:14,060 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3455440.0, ans=0.125 2023-11-26 15:38:15,028 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3455440.0, ans=0.125 2023-11-26 15:38:15,065 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3455440.0, ans=0.125 2023-11-26 15:38:16,818 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.326e+01 8.858e+01 9.436e+01 1.015e+02 1.276e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-26 15:38:23,784 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 1300, loss[loss=0.04955, simple_loss=0.06497, pruned_loss=0.006982, audio_tagging_loss=0.01009, over 14156.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.08973, pruned_loss=0.01225, audio_tagging_loss=0.008618, over 3047118.61 frames. ], batch size: 53, lr: 1.54e-03, grad_scale: 8.0 2023-11-26 15:38:24,916 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3455506.6666666665, ans=0.125 2023-11-26 15:38:37,187 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3455573.3333333335, ans=0.1 2023-11-26 15:38:39,300 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3455573.3333333335, ans=0.07 2023-11-26 15:38:43,651 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3455573.3333333335, ans=0.125 2023-11-26 15:38:47,687 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.31 vs. limit=15.0 2023-11-26 15:38:48,245 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 518350 2023-11-26 15:39:06,129 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3455706.6666666665, ans=0.125 2023-11-26 15:39:19,313 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 1350, loss[loss=0.06962, simple_loss=0.102, pruned_loss=0.0133, audio_tagging_loss=0.005338, over 14859.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08977, pruned_loss=0.01221, audio_tagging_loss=0.008671, over 3043796.52 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 8.0 2023-11-26 15:39:28,063 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3455840.0, ans=0.125 2023-11-26 15:39:37,494 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.61 vs. limit=10.0 2023-11-26 15:39:43,446 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 518400 2023-11-26 15:40:01,173 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 15:40:05,747 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3456106.6666666665, ans=0.0 2023-11-26 15:40:08,635 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.705e+01 8.757e+01 9.290e+01 1.016e+02 1.312e+02, threshold=1.858e+02, percent-clipped=0.0 2023-11-26 15:40:15,037 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 1400, loss[loss=0.06327, simple_loss=0.0729, pruned_loss=0.01491, audio_tagging_loss=0.01192, over 15811.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.08969, pruned_loss=0.01219, audio_tagging_loss=0.008695, over 3038654.40 frames. ], batch size: 60, lr: 1.54e-03, grad_scale: 8.0 2023-11-26 15:40:17,348 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.90 vs. limit=12.0 2023-11-26 15:40:18,001 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3456173.3333333335, ans=0.125 2023-11-26 15:40:20,694 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.00 vs. limit=15.0 2023-11-26 15:40:25,441 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3456240.0, ans=0.125 2023-11-26 15:40:25,945 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.17 vs. limit=10.0 2023-11-26 15:40:26,523 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3456240.0, ans=0.1 2023-11-26 15:40:26,623 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 15:40:29,243 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.67 vs. limit=15.0 2023-11-26 15:40:39,338 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.86 vs. limit=15.0 2023-11-26 15:40:39,869 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 518450 2023-11-26 15:40:49,892 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.91 vs. limit=12.0 2023-11-26 15:40:54,954 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.67 vs. limit=15.0 2023-11-26 15:41:03,853 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3456440.0, ans=0.125 2023-11-26 15:41:04,864 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3456440.0, ans=0.1 2023-11-26 15:41:10,960 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 1450, loss[loss=0.0741, simple_loss=0.1033, pruned_loss=0.01492, audio_tagging_loss=0.007513, over 14276.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.08992, pruned_loss=0.01227, audio_tagging_loss=0.008873, over 3032205.04 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 8.0 2023-11-26 15:41:21,310 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3456573.3333333335, ans=0.0 2023-11-26 15:41:33,180 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3456640.0, ans=0.0 2023-11-26 15:41:33,260 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3456640.0, ans=0.0 2023-11-26 15:41:36,270 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 518500 2023-11-26 15:41:44,142 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3456706.6666666665, ans=0.1 2023-11-26 15:41:54,346 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3456706.6666666665, ans=0.1 2023-11-26 15:42:00,973 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.759e+01 9.010e+01 9.704e+01 1.035e+02 1.675e+02, threshold=1.941e+02, percent-clipped=0.0 2023-11-26 15:42:07,278 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 15:42:07,290 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3456840.0, ans=0.125 2023-11-26 15:42:08,039 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 1500, loss[loss=0.07861, simple_loss=0.1032, pruned_loss=0.01514, audio_tagging_loss=0.01187, over 14620.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.09054, pruned_loss=0.01221, audio_tagging_loss=0.008894, over 3030585.26 frames. ], batch size: 54, lr: 1.54e-03, grad_scale: 8.0 2023-11-26 15:42:08,334 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3456840.0, ans=0.125 2023-11-26 15:42:11,076 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3456840.0, ans=0.2 2023-11-26 15:42:17,333 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3456840.0, ans=0.125 2023-11-26 15:42:23,716 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3456906.6666666665, ans=0.125 2023-11-26 15:42:30,018 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=3456973.3333333335, ans=22.5 2023-11-26 15:42:32,705 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 518550 2023-11-26 15:42:38,419 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.61 vs. limit=15.0 2023-11-26 15:42:56,152 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3457106.6666666665, ans=0.0 2023-11-26 15:42:56,218 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3457106.6666666665, ans=0.0 2023-11-26 15:43:01,534 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3457106.6666666665, ans=0.125 2023-11-26 15:43:03,424 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 1550, loss[loss=0.06027, simple_loss=0.07497, pruned_loss=0.01399, audio_tagging_loss=0.008798, over 14642.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.09004, pruned_loss=0.01211, audio_tagging_loss=0.009004, over 3036198.94 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 8.0 2023-11-26 15:43:06,945 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3457173.3333333335, ans=0.0 2023-11-26 15:43:10,098 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3457173.3333333335, ans=0.125 2023-11-26 15:43:14,782 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3457240.0, ans=0.05 2023-11-26 15:43:27,920 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 518600 2023-11-26 15:43:39,067 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3457373.3333333335, ans=0.0 2023-11-26 15:43:41,110 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 15:43:52,541 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.520e+01 8.861e+01 9.494e+01 1.024e+02 1.186e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-26 15:43:54,326 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.56 vs. limit=12.0 2023-11-26 15:43:57,601 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3457440.0, ans=0.1 2023-11-26 15:43:58,826 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3457506.6666666665, ans=0.2 2023-11-26 15:43:59,621 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 1600, loss[loss=0.06865, simple_loss=0.09472, pruned_loss=0.01235, audio_tagging_loss=0.008936, over 15637.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.09033, pruned_loss=0.01212, audio_tagging_loss=0.009085, over 3042597.31 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:44:06,867 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3457506.6666666665, ans=0.0 2023-11-26 15:44:22,848 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 15:44:24,881 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 518650 2023-11-26 15:44:26,092 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3457640.0, ans=0.125 2023-11-26 15:44:47,411 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3457773.3333333335, ans=0.0 2023-11-26 15:44:50,759 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3457773.3333333335, ans=0.2 2023-11-26 15:44:55,865 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 1650, loss[loss=0.05764, simple_loss=0.07356, pruned_loss=0.009854, audio_tagging_loss=0.011, over 16040.00 frames. ], tot_loss[loss=0.06674, simple_loss=0.09063, pruned_loss=0.01231, audio_tagging_loss=0.009116, over 3041876.13 frames. ], batch size: 61, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:44:57,768 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=3457840.0, ans=0.025 2023-11-26 15:45:03,491 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3457840.0, ans=0.1 2023-11-26 15:45:17,545 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.55 vs. limit=10.0 2023-11-26 15:45:20,412 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 518700 2023-11-26 15:45:45,761 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.390e+01 9.023e+01 9.377e+01 1.009e+02 1.256e+02, threshold=1.875e+02, percent-clipped=0.0 2023-11-26 15:45:47,171 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3458106.6666666665, ans=0.1 2023-11-26 15:45:52,303 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 1700, loss[loss=0.06589, simple_loss=0.08706, pruned_loss=0.01297, audio_tagging_loss=0.009385, over 14271.00 frames. ], tot_loss[loss=0.0664, simple_loss=0.0903, pruned_loss=0.01226, audio_tagging_loss=0.008991, over 3038260.96 frames. ], batch size: 54, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:45:56,264 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.07 vs. limit=15.0 2023-11-26 15:46:04,060 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3458240.0, ans=0.125 2023-11-26 15:46:06,307 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-26 15:46:16,690 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 518750 2023-11-26 15:46:38,306 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3458440.0, ans=0.2 2023-11-26 15:46:47,675 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 1750, loss[loss=0.04876, simple_loss=0.06525, pruned_loss=0.006841, audio_tagging_loss=0.009295, over 15128.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08914, pruned_loss=0.01214, audio_tagging_loss=0.008889, over 3040216.65 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:46:48,906 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3458506.6666666665, ans=0.0 2023-11-26 15:47:01,756 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3458573.3333333335, ans=0.2 2023-11-26 15:47:13,207 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 518800 2023-11-26 15:47:34,032 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3458773.3333333335, ans=0.025 2023-11-26 15:47:34,034 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3458773.3333333335, ans=0.0 2023-11-26 15:47:38,072 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.678e+01 8.660e+01 9.290e+01 1.019e+02 1.190e+02, threshold=1.858e+02, percent-clipped=0.0 2023-11-26 15:47:44,472 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 1800, loss[loss=0.07108, simple_loss=0.1018, pruned_loss=0.01225, audio_tagging_loss=0.007927, over 15961.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.08981, pruned_loss=0.01204, audio_tagging_loss=0.00878, over 3047560.29 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:47:54,311 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3458840.0, ans=0.0 2023-11-26 15:47:58,577 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3458906.6666666665, ans=0.5 2023-11-26 15:48:08,956 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 518850 2023-11-26 15:48:10,689 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.94 vs. limit=15.0 2023-11-26 15:48:20,384 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3459040.0, ans=0.07 2023-11-26 15:48:40,876 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 1850, loss[loss=0.07949, simple_loss=0.1024, pruned_loss=0.0193, audio_tagging_loss=0.008999, over 15236.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.09043, pruned_loss=0.0122, audio_tagging_loss=0.008665, over 3047236.38 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:49:00,876 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.06 vs. limit=22.5 2023-11-26 15:49:03,089 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3459306.6666666665, ans=0.0 2023-11-26 15:49:05,856 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 518900 2023-11-26 15:49:11,356 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3459306.6666666665, ans=0.0 2023-11-26 15:49:12,394 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3459306.6666666665, ans=0.2 2023-11-26 15:49:21,263 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.27 vs. limit=15.0 2023-11-26 15:49:25,606 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.03 vs. limit=15.0 2023-11-26 15:49:27,401 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3459440.0, ans=0.125 2023-11-26 15:49:29,569 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3459440.0, ans=0.1 2023-11-26 15:49:30,364 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.036e+01 8.755e+01 9.422e+01 1.025e+02 1.230e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-26 15:49:32,025 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.56 vs. limit=15.0 2023-11-26 15:49:36,725 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 1900, loss[loss=0.0649, simple_loss=0.09217, pruned_loss=0.01173, audio_tagging_loss=0.007087, over 15841.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.09007, pruned_loss=0.01218, audio_tagging_loss=0.008564, over 3045830.77 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:49:40,262 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3459506.6666666665, ans=0.0 2023-11-26 15:49:52,958 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3459573.3333333335, ans=0.0 2023-11-26 15:50:02,420 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 518950 2023-11-26 15:50:17,620 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3459706.6666666665, ans=0.125 2023-11-26 15:50:30,009 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3459773.3333333335, ans=0.125 2023-11-26 15:50:33,048 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 1950, loss[loss=0.06235, simple_loss=0.08983, pruned_loss=0.01091, audio_tagging_loss=0.006523, over 16093.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.08995, pruned_loss=0.0123, audio_tagging_loss=0.008631, over 3046540.97 frames. ], batch size: 59, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:50:39,983 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3459840.0, ans=0.125 2023-11-26 15:50:56,254 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3459973.3333333335, ans=0.2 2023-11-26 15:50:56,612 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.29 vs. limit=10.0 2023-11-26 15:50:58,245 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 519000 2023-11-26 15:50:59,763 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3459973.3333333335, ans=0.1 2023-11-26 15:51:01,126 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.14 vs. limit=22.5 2023-11-26 15:51:19,836 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 15:51:23,437 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.206e+01 8.668e+01 9.341e+01 1.000e+02 1.329e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-26 15:51:23,709 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3460106.6666666665, ans=0.0 2023-11-26 15:51:30,363 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 2000, loss[loss=0.06212, simple_loss=0.08583, pruned_loss=0.01074, audio_tagging_loss=0.00847, over 14308.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.09016, pruned_loss=0.01238, audio_tagging_loss=0.008696, over 3047496.65 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 15:51:33,821 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3460173.3333333335, ans=0.125 2023-11-26 15:51:50,760 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3460306.6666666665, ans=0.05 2023-11-26 15:51:54,371 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 519050 2023-11-26 15:52:23,963 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3460440.0, ans=0.035 2023-11-26 15:52:24,070 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3460440.0, ans=0.125 2023-11-26 15:52:26,023 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 2050, loss[loss=0.03968, simple_loss=0.0556, pruned_loss=0.005789, audio_tagging_loss=0.006086, over 14402.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.08981, pruned_loss=0.01239, audio_tagging_loss=0.008642, over 3048029.35 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 15:52:34,158 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.39 vs. limit=15.0 2023-11-26 15:52:39,416 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3460573.3333333335, ans=0.125 2023-11-26 15:52:42,048 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.70 vs. limit=15.0 2023-11-26 15:52:51,674 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 519100 2023-11-26 15:53:05,626 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3460706.6666666665, ans=0.035 2023-11-26 15:53:08,916 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3460706.6666666665, ans=0.125 2023-11-26 15:53:15,087 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.943e+01 8.697e+01 9.387e+01 1.014e+02 2.680e+02, threshold=1.877e+02, percent-clipped=1.0 2023-11-26 15:53:15,738 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.01 vs. limit=15.0 2023-11-26 15:53:20,340 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.94 vs. limit=6.0 2023-11-26 15:53:21,941 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 2100, loss[loss=0.06716, simple_loss=0.08608, pruned_loss=0.01325, audio_tagging_loss=0.01087, over 15296.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.09001, pruned_loss=0.01243, audio_tagging_loss=0.008614, over 3044132.88 frames. ], batch size: 59, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 15:53:40,303 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3460906.6666666665, ans=0.125 2023-11-26 15:53:46,622 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 519150 2023-11-26 15:53:51,336 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.20 vs. limit=6.0 2023-11-26 15:54:13,540 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-26 15:54:16,644 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3461106.6666666665, ans=0.07 2023-11-26 15:54:17,834 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3461173.3333333335, ans=0.0 2023-11-26 15:54:18,649 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 2150, loss[loss=0.05089, simple_loss=0.0692, pruned_loss=0.007262, audio_tagging_loss=0.009029, over 15202.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.08963, pruned_loss=0.01236, audio_tagging_loss=0.008616, over 3045724.86 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 15:54:22,525 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.13 vs. limit=22.5 2023-11-26 15:54:29,581 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3461240.0, ans=0.2 2023-11-26 15:54:30,706 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3461240.0, ans=0.0 2023-11-26 15:54:32,827 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3461240.0, ans=0.2 2023-11-26 15:54:40,418 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3461306.6666666665, ans=0.125 2023-11-26 15:54:43,005 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 519200 2023-11-26 15:54:44,733 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.98 vs. limit=10.0 2023-11-26 15:54:47,685 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3461306.6666666665, ans=0.07 2023-11-26 15:54:48,732 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 15:54:52,764 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 15:55:04,626 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3461440.0, ans=0.125 2023-11-26 15:55:07,732 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.539e+01 9.113e+01 9.715e+01 1.044e+02 1.389e+02, threshold=1.943e+02, percent-clipped=0.0 2023-11-26 15:55:13,210 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3461506.6666666665, ans=0.2 2023-11-26 15:55:14,124 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 2200, loss[loss=0.07322, simple_loss=0.08963, pruned_loss=0.01796, audio_tagging_loss=0.01045, over 13938.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.0907, pruned_loss=0.01243, audio_tagging_loss=0.008611, over 3043610.26 frames. ], batch size: 54, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 15:55:16,453 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3461506.6666666665, ans=0.125 2023-11-26 15:55:35,318 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3461640.0, ans=0.0 2023-11-26 15:55:39,018 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 519250 2023-11-26 15:55:39,088 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3461640.0, ans=0.125 2023-11-26 15:56:03,911 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3461773.3333333335, ans=0.0 2023-11-26 15:56:10,446 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 2250, loss[loss=0.06797, simple_loss=0.09779, pruned_loss=0.009113, audio_tagging_loss=0.00996, over 15176.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.09023, pruned_loss=0.01237, audio_tagging_loss=0.008626, over 3035346.62 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:56:12,738 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3461840.0, ans=0.125 2023-11-26 15:56:22,812 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3461906.6666666665, ans=0.125 2023-11-26 15:56:25,416 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3461906.6666666665, ans=10.0 2023-11-26 15:56:35,526 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 519300 2023-11-26 15:56:35,754 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3461973.3333333335, ans=10.0 2023-11-26 15:56:37,876 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3461973.3333333335, ans=0.0 2023-11-26 15:56:43,092 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3462040.0, ans=0.0 2023-11-26 15:56:44,131 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 15:56:47,334 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3462040.0, ans=0.125 2023-11-26 15:57:00,477 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.672e+01 8.882e+01 9.360e+01 1.008e+02 1.473e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-26 15:57:06,966 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 2300, loss[loss=0.05457, simple_loss=0.06773, pruned_loss=0.01063, audio_tagging_loss=0.01008, over 13317.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.09044, pruned_loss=0.01235, audio_tagging_loss=0.008732, over 3036704.69 frames. ], batch size: 52, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:57:12,852 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3462173.3333333335, ans=0.125 2023-11-26 15:57:30,735 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 519350 2023-11-26 15:57:35,721 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3462306.6666666665, ans=0.125 2023-11-26 15:57:41,046 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3462373.3333333335, ans=0.125 2023-11-26 15:57:42,239 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3462373.3333333335, ans=0.1 2023-11-26 15:57:43,623 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.86 vs. limit=15.0 2023-11-26 15:57:56,356 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 15:58:02,744 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 2350, loss[loss=0.04296, simple_loss=0.04953, pruned_loss=0.007395, audio_tagging_loss=0.0108, over 13547.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.0894, pruned_loss=0.01219, audio_tagging_loss=0.008805, over 3033139.52 frames. ], batch size: 53, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:58:05,187 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3462506.6666666665, ans=0.125 2023-11-26 15:58:26,802 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 519400 2023-11-26 15:58:28,500 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3462640.0, ans=0.2 2023-11-26 15:58:30,767 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3462640.0, ans=0.125 2023-11-26 15:58:31,123 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.91 vs. limit=6.0 2023-11-26 15:58:52,978 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.279e+01 8.835e+01 9.579e+01 1.049e+02 1.967e+02, threshold=1.916e+02, percent-clipped=1.0 2023-11-26 15:58:58,918 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 2400, loss[loss=0.06288, simple_loss=0.08378, pruned_loss=0.01118, audio_tagging_loss=0.00981, over 15105.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.08969, pruned_loss=0.0121, audio_tagging_loss=0.00887, over 3032010.80 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 15:59:24,262 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 519450 2023-11-26 15:59:24,397 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3462973.3333333335, ans=0.1 2023-11-26 15:59:24,945 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.85 vs. limit=12.0 2023-11-26 15:59:28,606 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=3462973.3333333335, ans=0.5 2023-11-26 15:59:30,168 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.40 vs. limit=15.0 2023-11-26 15:59:36,242 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3463040.0, ans=0.125 2023-11-26 15:59:45,237 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3463106.6666666665, ans=0.125 2023-11-26 15:59:54,867 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 2450, loss[loss=0.07018, simple_loss=0.08215, pruned_loss=0.0198, audio_tagging_loss=0.009305, over 14614.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.08963, pruned_loss=0.01205, audio_tagging_loss=0.00898, over 3031221.11 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 15:59:59,412 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3463173.3333333335, ans=0.0 2023-11-26 16:00:20,166 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 519500 2023-11-26 16:00:23,624 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3463306.6666666665, ans=0.0 2023-11-26 16:00:42,251 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3463440.0, ans=0.125 2023-11-26 16:00:46,235 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.797e+01 8.970e+01 9.604e+01 1.012e+02 1.518e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-26 16:00:52,118 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 2500, loss[loss=0.07309, simple_loss=0.09873, pruned_loss=0.01477, audio_tagging_loss=0.00896, over 16945.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.09002, pruned_loss=0.0122, audio_tagging_loss=0.008935, over 3041212.59 frames. ], batch size: 62, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:01:05,156 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3463573.3333333335, ans=0.125 2023-11-26 16:01:07,325 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3463573.3333333335, ans=0.125 2023-11-26 16:01:10,359 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.00 vs. limit=15.0 2023-11-26 16:01:16,327 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 519550 2023-11-26 16:01:30,610 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.18 vs. limit=12.0 2023-11-26 16:01:47,776 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 2550, loss[loss=0.07558, simple_loss=0.1085, pruned_loss=0.01277, audio_tagging_loss=0.008534, over 15231.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.0894, pruned_loss=0.01202, audio_tagging_loss=0.00887, over 3040123.62 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:01:58,286 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3463906.6666666665, ans=0.125 2023-11-26 16:02:00,332 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3463906.6666666665, ans=0.125 2023-11-26 16:02:02,606 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3463906.6666666665, ans=0.0 2023-11-26 16:02:12,975 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 519600 2023-11-26 16:02:40,009 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.369e+01 8.652e+01 9.307e+01 9.985e+01 1.166e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-26 16:02:44,317 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 2600, loss[loss=0.06002, simple_loss=0.07833, pruned_loss=0.01201, audio_tagging_loss=0.008843, over 15116.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.08993, pruned_loss=0.01204, audio_tagging_loss=0.008711, over 3044855.06 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:02:52,072 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3464173.3333333335, ans=0.2 2023-11-26 16:02:53,128 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=3464173.3333333335, ans=0.95 2023-11-26 16:03:09,614 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 519650 2023-11-26 16:03:40,916 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 2650, loss[loss=0.05922, simple_loss=0.06907, pruned_loss=0.01346, audio_tagging_loss=0.01123, over 14661.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08927, pruned_loss=0.01194, audio_tagging_loss=0.008766, over 3039864.37 frames. ], batch size: 59, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:03:42,406 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.29 vs. limit=6.0 2023-11-26 16:03:43,138 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3464506.6666666665, ans=0.125 2023-11-26 16:04:01,195 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3464573.3333333335, ans=0.125 2023-11-26 16:04:05,233 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 519700 2023-11-26 16:04:14,492 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3464706.6666666665, ans=0.125 2023-11-26 16:04:32,547 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.592e+01 8.790e+01 9.468e+01 1.013e+02 1.366e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-26 16:04:36,895 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 2700, loss[loss=0.06516, simple_loss=0.09195, pruned_loss=0.009964, audio_tagging_loss=0.009222, over 15294.00 frames. ], tot_loss[loss=0.0651, simple_loss=0.08898, pruned_loss=0.01193, audio_tagging_loss=0.008674, over 3038545.25 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:04:56,414 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3464906.6666666665, ans=0.125 2023-11-26 16:05:02,156 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 519750 2023-11-26 16:05:03,497 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3464973.3333333335, ans=0.0 2023-11-26 16:05:24,114 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3465106.6666666665, ans=0.125 2023-11-26 16:05:27,939 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3465106.6666666665, ans=0.125 2023-11-26 16:05:33,065 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 2750, loss[loss=0.07862, simple_loss=0.1036, pruned_loss=0.01763, audio_tagging_loss=0.009203, over 14923.00 frames. ], tot_loss[loss=0.06457, simple_loss=0.0881, pruned_loss=0.01185, audio_tagging_loss=0.008672, over 3033767.73 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:05:57,563 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 519800 2023-11-26 16:05:57,823 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3465306.6666666665, ans=0.1 2023-11-26 16:06:19,167 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3465440.0, ans=0.0 2023-11-26 16:06:22,656 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 16:06:25,287 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.739e+01 8.811e+01 9.283e+01 1.006e+02 1.287e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-26 16:06:29,587 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 2800, loss[loss=0.04474, simple_loss=0.05352, pruned_loss=0.007243, audio_tagging_loss=0.01074, over 17007.00 frames. ], tot_loss[loss=0.06484, simple_loss=0.08849, pruned_loss=0.01188, audio_tagging_loss=0.008719, over 3037043.54 frames. ], batch size: 67, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:06:30,833 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3465506.6666666665, ans=0.125 2023-11-26 16:06:42,560 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3465573.3333333335, ans=0.125 2023-11-26 16:06:45,130 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3465573.3333333335, ans=0.04949747468305833 2023-11-26 16:06:54,056 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 519850 2023-11-26 16:07:16,072 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.60 vs. limit=12.0 2023-11-26 16:07:18,800 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3465773.3333333335, ans=0.09899494936611666 2023-11-26 16:07:24,876 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 2850, loss[loss=0.06561, simple_loss=0.09871, pruned_loss=0.0108, audio_tagging_loss=0.005459, over 15497.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.08881, pruned_loss=0.01184, audio_tagging_loss=0.00873, over 3031475.19 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:07:31,691 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.49 vs. limit=5.0 2023-11-26 16:07:50,756 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 519900 2023-11-26 16:07:57,389 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3465973.3333333335, ans=0.125 2023-11-26 16:08:18,470 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.391e+01 8.838e+01 9.404e+01 1.047e+02 1.303e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-26 16:08:21,838 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 2900, loss[loss=0.05554, simple_loss=0.07612, pruned_loss=0.007152, audio_tagging_loss=0.01032, over 16511.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.08914, pruned_loss=0.01186, audio_tagging_loss=0.008729, over 3030037.61 frames. ], batch size: 61, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:08:38,105 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3466240.0, ans=0.1 2023-11-26 16:08:46,458 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 519950 2023-11-26 16:08:46,850 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.10 vs. limit=15.0 2023-11-26 16:08:56,785 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3466373.3333333335, ans=0.0 2023-11-26 16:09:00,719 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3466373.3333333335, ans=0.125 2023-11-26 16:09:01,991 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.92 vs. limit=22.5 2023-11-26 16:09:07,202 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3466440.0, ans=0.2 2023-11-26 16:09:18,650 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 2950, loss[loss=0.068, simple_loss=0.09491, pruned_loss=0.01066, audio_tagging_loss=0.009891, over 14759.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.08999, pruned_loss=0.01215, audio_tagging_loss=0.008782, over 3031930.44 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:09:28,903 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.90 vs. limit=22.5 2023-11-26 16:09:43,063 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 520000 2023-11-26 16:09:44,362 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-520000.pt 2023-11-26 16:09:48,731 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3466640.0, ans=0.0 2023-11-26 16:10:12,943 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.519e+01 8.851e+01 9.554e+01 1.014e+02 1.213e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-26 16:10:15,930 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.33 vs. limit=15.0 2023-11-26 16:10:16,215 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 3000, loss[loss=0.05556, simple_loss=0.06783, pruned_loss=0.008766, audio_tagging_loss=0.01287, over 15094.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.08914, pruned_loss=0.01203, audio_tagging_loss=0.008809, over 3035823.16 frames. ], batch size: 62, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:10:16,217 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-26 16:10:48,835 INFO [train_asr.py:1267] (0/4) Epoch 44, validation: loss=0.05748, simple_loss=0.05058, pruned_loss=0.005287, audio_tagging_loss=0.02691, over 4681554.00 frames. 2023-11-26 16:10:48,836 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-26 16:10:56,933 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.24 vs. limit=22.5 2023-11-26 16:11:06,524 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.32 vs. limit=15.0 2023-11-26 16:11:13,572 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 520050 2023-11-26 16:11:19,123 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3466973.3333333335, ans=0.0 2023-11-26 16:11:28,584 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3467040.0, ans=0.2 2023-11-26 16:11:39,320 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3467106.6666666665, ans=0.1 2023-11-26 16:11:45,575 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 3050, loss[loss=0.08727, simple_loss=0.1158, pruned_loss=0.02258, audio_tagging_loss=0.006792, over 15956.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.09027, pruned_loss=0.01233, audio_tagging_loss=0.008743, over 3043010.50 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:11:46,924 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3467173.3333333335, ans=0.0 2023-11-26 16:11:51,092 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3467173.3333333335, ans=0.0 2023-11-26 16:11:55,817 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.86 vs. limit=15.0 2023-11-26 16:12:09,697 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 520100 2023-11-26 16:12:14,580 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3467306.6666666665, ans=0.1 2023-11-26 16:12:19,145 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 16:12:21,153 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.86 vs. limit=12.0 2023-11-26 16:12:30,491 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3467440.0, ans=0.2 2023-11-26 16:12:37,804 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.611e+01 8.992e+01 9.720e+01 1.054e+02 1.278e+02, threshold=1.944e+02, percent-clipped=0.0 2023-11-26 16:12:41,148 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 3100, loss[loss=0.06152, simple_loss=0.07968, pruned_loss=0.01246, audio_tagging_loss=0.009215, over 15622.00 frames. ], tot_loss[loss=0.06683, simple_loss=0.09123, pruned_loss=0.01244, audio_tagging_loss=0.008775, over 3046640.53 frames. ], batch size: 62, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:12:47,260 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.61 vs. limit=10.0 2023-11-26 16:13:06,282 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 520150 2023-11-26 16:13:08,506 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3467640.0, ans=0.125 2023-11-26 16:13:13,862 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3467706.6666666665, ans=0.0 2023-11-26 16:13:20,281 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3467706.6666666665, ans=0.0 2023-11-26 16:13:27,859 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3467773.3333333335, ans=0.0 2023-11-26 16:13:36,589 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 3150, loss[loss=0.0542, simple_loss=0.07505, pruned_loss=0.006048, audio_tagging_loss=0.01063, over 14615.00 frames. ], tot_loss[loss=0.0667, simple_loss=0.09121, pruned_loss=0.01228, audio_tagging_loss=0.008819, over 3045726.96 frames. ], batch size: 54, lr: 1.54e-03, grad_scale: 8.0 2023-11-26 16:13:46,485 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3467840.0, ans=0.125 2023-11-26 16:13:53,894 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3467906.6666666665, ans=0.125 2023-11-26 16:14:01,386 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 520200 2023-11-26 16:14:02,955 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3467973.3333333335, ans=0.125 2023-11-26 16:14:10,454 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3468040.0, ans=0.125 2023-11-26 16:14:23,003 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.96 vs. limit=10.0 2023-11-26 16:14:26,579 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3468106.6666666665, ans=0.125 2023-11-26 16:14:31,094 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.410e+01 8.852e+01 9.512e+01 1.032e+02 1.320e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-26 16:14:33,235 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 3200, loss[loss=0.06892, simple_loss=0.08932, pruned_loss=0.01484, audio_tagging_loss=0.009416, over 15019.00 frames. ], tot_loss[loss=0.06725, simple_loss=0.09182, pruned_loss=0.01249, audio_tagging_loss=0.008856, over 3050779.94 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:14:37,850 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3468173.3333333335, ans=0.125 2023-11-26 16:14:56,923 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 520250 2023-11-26 16:15:10,045 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3468373.3333333335, ans=0.0 2023-11-26 16:15:13,675 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 16:15:28,512 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 3250, loss[loss=0.05559, simple_loss=0.06398, pruned_loss=0.009168, audio_tagging_loss=0.01443, over 14576.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.09018, pruned_loss=0.01228, audio_tagging_loss=0.00899, over 3052838.19 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:15:30,901 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3468506.6666666665, ans=0.125 2023-11-26 16:15:54,090 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 520300 2023-11-26 16:15:58,941 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.10 vs. limit=6.0 2023-11-26 16:16:21,811 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.327e+01 8.992e+01 9.345e+01 1.022e+02 1.465e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-26 16:16:23,914 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 3300, loss[loss=0.06202, simple_loss=0.08029, pruned_loss=0.01082, audio_tagging_loss=0.01105, over 15904.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.0902, pruned_loss=0.01226, audio_tagging_loss=0.009002, over 3053195.60 frames. ], batch size: 60, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:16:30,597 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3468840.0, ans=0.125 2023-11-26 16:16:49,503 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 520350 2023-11-26 16:17:01,539 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3469040.0, ans=0.0 2023-11-26 16:17:14,253 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.07 vs. limit=15.0 2023-11-26 16:17:14,943 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3469106.6666666665, ans=0.125 2023-11-26 16:17:21,079 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 3350, loss[loss=0.0644, simple_loss=0.08811, pruned_loss=0.009969, audio_tagging_loss=0.01038, over 15360.00 frames. ], tot_loss[loss=0.06667, simple_loss=0.09043, pruned_loss=0.01246, audio_tagging_loss=0.008988, over 3047615.01 frames. ], batch size: 59, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:17:24,392 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3469173.3333333335, ans=0.125 2023-11-26 16:17:24,414 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3469173.3333333335, ans=0.125 2023-11-26 16:17:28,634 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3469173.3333333335, ans=0.1 2023-11-26 16:17:29,611 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3469173.3333333335, ans=0.04949747468305833 2023-11-26 16:17:40,289 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3469240.0, ans=0.2 2023-11-26 16:17:44,296 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 520400 2023-11-26 16:17:51,546 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3469306.6666666665, ans=0.1 2023-11-26 16:18:13,708 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.397e+01 8.575e+01 9.293e+01 1.025e+02 1.253e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-26 16:18:15,813 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 3400, loss[loss=0.06, simple_loss=0.08007, pruned_loss=0.01123, audio_tagging_loss=0.008739, over 14900.00 frames. ], tot_loss[loss=0.06653, simple_loss=0.09032, pruned_loss=0.01249, audio_tagging_loss=0.008879, over 3046434.98 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:18:28,048 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.62 vs. limit=15.0 2023-11-26 16:18:40,614 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 520450 2023-11-26 16:19:10,989 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 3450, loss[loss=0.06626, simple_loss=0.0938, pruned_loss=0.01167, audio_tagging_loss=0.007693, over 14295.00 frames. ], tot_loss[loss=0.06673, simple_loss=0.09077, pruned_loss=0.01257, audio_tagging_loss=0.008778, over 3045575.08 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:19:13,347 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3469840.0, ans=0.0 2023-11-26 16:19:26,140 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3469906.6666666665, ans=0.015 2023-11-26 16:19:36,338 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 520500 2023-11-26 16:19:49,564 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3470040.0, ans=0.0 2023-11-26 16:19:50,510 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3470040.0, ans=0.125 2023-11-26 16:20:05,288 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.719e+01 8.921e+01 9.582e+01 1.025e+02 1.197e+02, threshold=1.916e+02, percent-clipped=0.0 2023-11-26 16:20:07,465 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 3500, loss[loss=0.07773, simple_loss=0.1091, pruned_loss=0.0159, audio_tagging_loss=0.007252, over 14545.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.09035, pruned_loss=0.01252, audio_tagging_loss=0.008663, over 3049347.60 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:20:31,638 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 520550 2023-11-26 16:20:35,906 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 16:20:53,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3470440.0, ans=0.125 2023-11-26 16:20:55,847 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3470440.0, ans=0.0 2023-11-26 16:21:02,248 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3470506.6666666665, ans=0.2 2023-11-26 16:21:03,046 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 3550, loss[loss=0.07211, simple_loss=0.1017, pruned_loss=0.01302, audio_tagging_loss=0.008252, over 15103.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.08966, pruned_loss=0.01246, audio_tagging_loss=0.008721, over 3046750.25 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:21:06,448 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3470506.6666666665, ans=0.125 2023-11-26 16:21:27,169 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 520600 2023-11-26 16:21:35,170 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3470706.6666666665, ans=0.125 2023-11-26 16:21:37,997 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3470706.6666666665, ans=0.0 2023-11-26 16:21:55,834 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.188e+01 8.906e+01 9.475e+01 1.015e+02 1.360e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-26 16:21:57,992 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 3600, loss[loss=0.06216, simple_loss=0.08494, pruned_loss=0.01027, audio_tagging_loss=0.009419, over 15744.00 frames. ], tot_loss[loss=0.06488, simple_loss=0.08806, pruned_loss=0.01215, audio_tagging_loss=0.008697, over 3048464.18 frames. ], batch size: 62, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:22:15,586 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.78 vs. limit=6.0 2023-11-26 16:22:17,555 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3470906.6666666665, ans=0.04949747468305833 2023-11-26 16:22:23,257 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 520650 2023-11-26 16:22:24,788 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.54 vs. limit=15.0 2023-11-26 16:22:31,036 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3471040.0, ans=0.0 2023-11-26 16:22:54,297 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 3650, loss[loss=0.06853, simple_loss=0.09604, pruned_loss=0.01275, audio_tagging_loss=0.007761, over 15694.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.0887, pruned_loss=0.0123, audio_tagging_loss=0.008515, over 3046923.34 frames. ], batch size: 59, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:22:57,702 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3471173.3333333335, ans=0.0 2023-11-26 16:23:16,499 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3471306.6666666665, ans=0.0 2023-11-26 16:23:18,327 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 520700 2023-11-26 16:23:27,070 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3471373.3333333335, ans=0.0 2023-11-26 16:23:32,914 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3471373.3333333335, ans=0.125 2023-11-26 16:23:38,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3471440.0, ans=0.1 2023-11-26 16:23:39,866 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3471440.0, ans=0.1 2023-11-26 16:23:47,022 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.977e+01 8.866e+01 9.412e+01 1.015e+02 1.534e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-26 16:23:49,726 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 3700, loss[loss=0.0804, simple_loss=0.09854, pruned_loss=0.01925, audio_tagging_loss=0.01188, over 14743.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.08899, pruned_loss=0.01222, audio_tagging_loss=0.008474, over 3051713.22 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:23:54,197 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3471506.6666666665, ans=0.0 2023-11-26 16:23:54,668 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.46 vs. limit=15.0 2023-11-26 16:24:04,021 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3471573.3333333335, ans=0.125 2023-11-26 16:24:13,862 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 520750 2023-11-26 16:24:15,008 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3471640.0, ans=0.2 2023-11-26 16:24:33,399 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.74 vs. limit=15.0 2023-11-26 16:24:35,232 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3471773.3333333335, ans=0.2 2023-11-26 16:24:44,724 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 3750, loss[loss=0.05475, simple_loss=0.07629, pruned_loss=0.006769, audio_tagging_loss=0.009842, over 16100.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.08963, pruned_loss=0.01227, audio_tagging_loss=0.008613, over 3055929.41 frames. ], batch size: 61, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:24:54,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3471906.6666666665, ans=0.0 2023-11-26 16:25:01,961 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3471906.6666666665, ans=10.0 2023-11-26 16:25:05,241 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3471906.6666666665, ans=0.125 2023-11-26 16:25:09,435 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 520800 2023-11-26 16:25:09,902 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.59 vs. limit=15.0 2023-11-26 16:25:15,653 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3471973.3333333335, ans=0.125 2023-11-26 16:25:24,031 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 16:25:39,675 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.862e+01 9.098e+01 9.695e+01 1.024e+02 1.279e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-26 16:25:40,796 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 3800, loss[loss=0.04895, simple_loss=0.05604, pruned_loss=0.009579, audio_tagging_loss=0.01135, over 14760.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.0895, pruned_loss=0.01229, audio_tagging_loss=0.008743, over 3050353.58 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:25:47,342 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=3472173.3333333335, ans=0.05 2023-11-26 16:26:01,156 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.63 vs. limit=22.5 2023-11-26 16:26:02,940 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3472306.6666666665, ans=0.125 2023-11-26 16:26:05,836 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 520850 2023-11-26 16:26:21,300 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3472373.3333333335, ans=0.0 2023-11-26 16:26:21,336 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3472373.3333333335, ans=0.1 2023-11-26 16:26:22,394 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3472373.3333333335, ans=0.1 2023-11-26 16:26:36,578 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 3850, loss[loss=0.07362, simple_loss=0.1043, pruned_loss=0.01128, audio_tagging_loss=0.01018, over 16073.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.08929, pruned_loss=0.01221, audio_tagging_loss=0.008897, over 3053677.96 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:26:57,673 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3472640.0, ans=0.2 2023-11-26 16:26:58,755 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3472640.0, ans=0.125 2023-11-26 16:27:01,206 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 520900 2023-11-26 16:27:08,785 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.35 vs. limit=22.5 2023-11-26 16:27:30,926 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.762e+01 8.861e+01 9.516e+01 1.005e+02 1.326e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-26 16:27:32,023 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 3900, loss[loss=0.05894, simple_loss=0.08294, pruned_loss=0.009052, audio_tagging_loss=0.008413, over 15210.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.0882, pruned_loss=0.01212, audio_tagging_loss=0.008988, over 3052648.41 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:27:35,442 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3472840.0, ans=0.05 2023-11-26 16:27:39,774 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3472840.0, ans=0.125 2023-11-26 16:27:43,539 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3472906.6666666665, ans=0.125 2023-11-26 16:27:45,547 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3472906.6666666665, ans=0.125 2023-11-26 16:27:57,179 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 520950 2023-11-26 16:28:00,703 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.26 vs. limit=22.5 2023-11-26 16:28:06,857 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3473040.0, ans=0.0 2023-11-26 16:28:21,392 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 16:28:21,465 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3473106.6666666665, ans=0.0 2023-11-26 16:28:27,732 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.99 vs. limit=12.0 2023-11-26 16:28:28,140 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 3950, loss[loss=0.05945, simple_loss=0.07237, pruned_loss=0.01224, audio_tagging_loss=0.01102, over 14662.00 frames. ], tot_loss[loss=0.06498, simple_loss=0.08779, pruned_loss=0.01204, audio_tagging_loss=0.009046, over 3046100.26 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:28:39,427 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3473240.0, ans=0.125 2023-11-26 16:28:40,828 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.68 vs. limit=15.0 2023-11-26 16:28:46,813 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3473240.0, ans=0.2 2023-11-26 16:28:51,995 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 521000 2023-11-26 16:29:01,740 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3473373.3333333335, ans=0.125 2023-11-26 16:29:22,814 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.631e+01 9.021e+01 9.497e+01 1.017e+02 1.308e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-26 16:29:23,983 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 4000, loss[loss=0.05848, simple_loss=0.07251, pruned_loss=0.01265, audio_tagging_loss=0.00957, over 14138.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08816, pruned_loss=0.01219, audio_tagging_loss=0.009123, over 3039154.09 frames. ], batch size: 53, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:29:26,426 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3473506.6666666665, ans=0.0 2023-11-26 16:29:27,395 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3473506.6666666665, ans=0.2 2023-11-26 16:29:48,264 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 521050 2023-11-26 16:30:12,873 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.37 vs. limit=15.0 2023-11-26 16:30:19,802 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 4050, loss[loss=0.06941, simple_loss=0.09322, pruned_loss=0.01208, audio_tagging_loss=0.01071, over 14690.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.08866, pruned_loss=0.01219, audio_tagging_loss=0.009137, over 3040069.03 frames. ], batch size: 54, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:30:22,598 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3473840.0, ans=0.1 2023-11-26 16:30:23,562 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 16:30:44,945 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 521100 2023-11-26 16:30:46,649 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.59 vs. limit=15.0 2023-11-26 16:30:50,597 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3473973.3333333335, ans=0.0 2023-11-26 16:31:14,311 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.312e+01 8.820e+01 9.465e+01 1.007e+02 1.196e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-26 16:31:15,938 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 4100, loss[loss=0.07105, simple_loss=0.1051, pruned_loss=0.01083, audio_tagging_loss=0.007678, over 13965.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08841, pruned_loss=0.01203, audio_tagging_loss=0.009117, over 3038543.64 frames. ], batch size: 52, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:31:18,381 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3474173.3333333335, ans=0.09899494936611666 2023-11-26 16:31:40,672 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 521150 2023-11-26 16:31:41,831 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3474306.6666666665, ans=0.125 2023-11-26 16:31:42,859 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3474306.6666666665, ans=0.0 2023-11-26 16:31:51,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3474373.3333333335, ans=0.0 2023-11-26 16:31:57,289 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3474373.3333333335, ans=0.125 2023-11-26 16:32:00,102 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3474440.0, ans=0.2 2023-11-26 16:32:09,613 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3474440.0, ans=0.125 2023-11-26 16:32:12,738 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 4150, loss[loss=0.06647, simple_loss=0.097, pruned_loss=0.01062, audio_tagging_loss=0.007351, over 15618.00 frames. ], tot_loss[loss=0.06506, simple_loss=0.08837, pruned_loss=0.01188, audio_tagging_loss=0.008991, over 3039575.86 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:32:17,269 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3474506.6666666665, ans=0.2 2023-11-26 16:32:19,227 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3474506.6666666665, ans=0.1 2023-11-26 16:32:35,981 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3474640.0, ans=0.1 2023-11-26 16:32:36,822 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 521200 2023-11-26 16:32:40,496 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3474640.0, ans=0.0 2023-11-26 16:32:54,685 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 16:33:04,473 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3474773.3333333335, ans=0.125 2023-11-26 16:33:07,360 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.415e+01 9.007e+01 9.363e+01 1.013e+02 1.321e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-26 16:33:07,855 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=3474840.0, ans=15.0 2023-11-26 16:33:08,497 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 4200, loss[loss=0.05567, simple_loss=0.08332, pruned_loss=0.008076, audio_tagging_loss=0.005936, over 15522.00 frames. ], tot_loss[loss=0.06511, simple_loss=0.08904, pruned_loss=0.01182, audio_tagging_loss=0.008773, over 3041368.10 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:33:10,841 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3474840.0, ans=0.0 2023-11-26 16:33:28,042 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3474906.6666666665, ans=0.125 2023-11-26 16:33:33,766 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 521250 2023-11-26 16:34:04,096 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 4250, loss[loss=0.062, simple_loss=0.08304, pruned_loss=0.01301, audio_tagging_loss=0.007472, over 14905.00 frames. ], tot_loss[loss=0.06506, simple_loss=0.08912, pruned_loss=0.01185, audio_tagging_loss=0.008658, over 3046696.36 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:34:22,436 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3475240.0, ans=0.2 2023-11-26 16:34:28,591 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 521300 2023-11-26 16:34:33,367 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.96 vs. limit=15.0 2023-11-26 16:34:40,323 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3475373.3333333335, ans=0.125 2023-11-26 16:34:59,003 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.776e+01 8.954e+01 9.475e+01 1.016e+02 1.438e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-26 16:35:00,219 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 4300, loss[loss=0.05823, simple_loss=0.07363, pruned_loss=0.01113, audio_tagging_loss=0.01028, over 15282.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.08978, pruned_loss=0.01209, audio_tagging_loss=0.008561, over 3048505.94 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:35:23,619 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 521350 2023-11-26 16:35:32,427 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3475706.6666666665, ans=0.2 2023-11-26 16:35:43,805 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3475773.3333333335, ans=0.125 2023-11-26 16:35:55,107 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 4350, loss[loss=0.07572, simple_loss=0.1098, pruned_loss=0.01379, audio_tagging_loss=0.007009, over 16202.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.09011, pruned_loss=0.01219, audio_tagging_loss=0.008556, over 3043203.32 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:36:19,549 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 521400 2023-11-26 16:36:21,555 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3475973.3333333335, ans=0.125 2023-11-26 16:36:23,782 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 16:36:45,103 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3476106.6666666665, ans=0.125 2023-11-26 16:36:49,101 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.562e+01 8.904e+01 9.369e+01 1.014e+02 1.389e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-26 16:36:50,193 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 4400, loss[loss=0.07375, simple_loss=0.1025, pruned_loss=0.01447, audio_tagging_loss=0.008013, over 14592.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.09041, pruned_loss=0.01219, audio_tagging_loss=0.008618, over 3045025.23 frames. ], batch size: 53, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:37:14,875 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 521450 2023-11-26 16:37:24,617 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3476373.3333333335, ans=0.125 2023-11-26 16:37:46,787 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 4450, loss[loss=0.06762, simple_loss=0.08841, pruned_loss=0.01028, audio_tagging_loss=0.01314, over 16557.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.09035, pruned_loss=0.01234, audio_tagging_loss=0.008611, over 3046928.73 frames. ], batch size: 60, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:37:46,956 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3476506.6666666665, ans=0.125 2023-11-26 16:37:52,387 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3476506.6666666665, ans=0.1 2023-11-26 16:37:54,742 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten.whitening_limit, batch_count=3476506.6666666665, ans=15.0 2023-11-26 16:37:57,652 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 16:38:10,259 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 521500 2023-11-26 16:38:32,203 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3476773.3333333335, ans=0.0 2023-11-26 16:38:40,517 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.377e+01 8.912e+01 9.426e+01 1.014e+02 1.545e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-26 16:38:41,579 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 4500, loss[loss=0.08037, simple_loss=0.115, pruned_loss=0.01533, audio_tagging_loss=0.007526, over 15564.00 frames. ], tot_loss[loss=0.06673, simple_loss=0.09119, pruned_loss=0.01255, audio_tagging_loss=0.008587, over 3053238.34 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:38:43,862 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3476840.0, ans=0.0 2023-11-26 16:38:44,012 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3476840.0, ans=0.2 2023-11-26 16:38:53,296 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3476906.6666666665, ans=0.125 2023-11-26 16:38:53,807 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.96 vs. limit=22.5 2023-11-26 16:38:56,674 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3476906.6666666665, ans=0.1 2023-11-26 16:39:05,309 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 521550 2023-11-26 16:39:17,075 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3477040.0, ans=0.0 2023-11-26 16:39:20,604 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.99 vs. limit=15.0 2023-11-26 16:39:27,793 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3477106.6666666665, ans=0.1 2023-11-26 16:39:27,817 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3477106.6666666665, ans=0.125 2023-11-26 16:39:29,230 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.93 vs. limit=15.0 2023-11-26 16:39:29,948 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3477106.6666666665, ans=0.125 2023-11-26 16:39:36,117 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 4550, loss[loss=0.05961, simple_loss=0.08185, pruned_loss=0.009163, audio_tagging_loss=0.009523, over 15019.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.09099, pruned_loss=0.01254, audio_tagging_loss=0.008522, over 3053182.73 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:39:42,648 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3477173.3333333335, ans=0.125 2023-11-26 16:39:57,428 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.60 vs. limit=15.0 2023-11-26 16:40:01,318 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 521600 2023-11-26 16:40:10,269 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3477373.3333333335, ans=0.125 2023-11-26 16:40:20,739 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 16:40:30,850 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.935e+01 8.644e+01 9.356e+01 1.024e+02 1.228e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-26 16:40:31,932 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 4600, loss[loss=0.06447, simple_loss=0.0808, pruned_loss=0.01316, audio_tagging_loss=0.01091, over 14388.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.09035, pruned_loss=0.0125, audio_tagging_loss=0.008657, over 3042178.82 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:40:36,994 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3477506.6666666665, ans=0.09899494936611666 2023-11-26 16:40:55,070 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 16:40:57,094 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 521650 2023-11-26 16:40:58,236 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3477640.0, ans=0.125 2023-11-26 16:41:11,383 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3477706.6666666665, ans=0.1 2023-11-26 16:41:24,577 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3477773.3333333335, ans=0.05 2023-11-26 16:41:28,635 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 4650, loss[loss=0.06072, simple_loss=0.083, pruned_loss=0.01092, audio_tagging_loss=0.008295, over 14348.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08958, pruned_loss=0.01235, audio_tagging_loss=0.008721, over 3039598.49 frames. ], batch size: 54, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:41:36,369 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3477840.0, ans=0.0 2023-11-26 16:41:52,944 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 521700 2023-11-26 16:41:54,194 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3477973.3333333335, ans=0.1 2023-11-26 16:42:00,069 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3477973.3333333335, ans=0.125 2023-11-26 16:42:03,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=3478040.0, ans=0.95 2023-11-26 16:42:10,174 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3478040.0, ans=0.1 2023-11-26 16:42:21,804 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 16:42:23,737 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.106e+01 8.839e+01 9.612e+01 1.022e+02 1.375e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-26 16:42:23,765 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 4700, loss[loss=0.07194, simple_loss=0.1043, pruned_loss=0.01324, audio_tagging_loss=0.006525, over 15387.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.08868, pruned_loss=0.01218, audio_tagging_loss=0.008877, over 3039789.00 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:42:36,263 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3478240.0, ans=0.1 2023-11-26 16:42:39,444 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3478240.0, ans=0.125 2023-11-26 16:42:49,022 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 521750 2023-11-26 16:42:52,349 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3478306.6666666665, ans=0.2 2023-11-26 16:43:04,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3478373.3333333335, ans=0.1 2023-11-26 16:43:10,398 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3478440.0, ans=0.0 2023-11-26 16:43:19,239 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 4750, loss[loss=0.06278, simple_loss=0.08948, pruned_loss=0.008129, audio_tagging_loss=0.009905, over 16430.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08886, pruned_loss=0.01213, audio_tagging_loss=0.008874, over 3043457.06 frames. ], batch size: 63, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:43:20,526 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3478506.6666666665, ans=0.125 2023-11-26 16:43:32,200 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3478573.3333333335, ans=0.125 2023-11-26 16:43:32,225 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3478573.3333333335, ans=0.2 2023-11-26 16:43:38,524 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3478573.3333333335, ans=0.09899494936611666 2023-11-26 16:43:43,693 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 521800 2023-11-26 16:44:14,874 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3478840.0, ans=0.0 2023-11-26 16:44:15,711 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 4800, loss[loss=0.07108, simple_loss=0.106, pruned_loss=0.01133, audio_tagging_loss=0.006775, over 14877.00 frames. ], tot_loss[loss=0.06523, simple_loss=0.08848, pruned_loss=0.01203, audio_tagging_loss=0.008953, over 3043436.31 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:44:16,765 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.694e+01 8.936e+01 9.415e+01 1.023e+02 1.286e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-26 16:44:38,596 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3478973.3333333335, ans=0.125 2023-11-26 16:44:39,426 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 521850 2023-11-26 16:44:39,654 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3478973.3333333335, ans=0.1 2023-11-26 16:44:53,127 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3479040.0, ans=0.125 2023-11-26 16:44:55,804 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.60 vs. limit=6.0 2023-11-26 16:44:56,710 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.08 vs. limit=15.0 2023-11-26 16:45:11,482 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 4850, loss[loss=0.05266, simple_loss=0.06614, pruned_loss=0.006646, audio_tagging_loss=0.01295, over 14633.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.08928, pruned_loss=0.01207, audio_tagging_loss=0.009137, over 3042277.24 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:45:11,691 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3479173.3333333335, ans=0.125 2023-11-26 16:45:18,126 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3479173.3333333335, ans=0.125 2023-11-26 16:45:26,425 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.36 vs. limit=10.0 2023-11-26 16:45:30,087 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.91 vs. limit=15.0 2023-11-26 16:45:36,672 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 521900 2023-11-26 16:45:53,303 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3479373.3333333335, ans=0.035 2023-11-26 16:45:53,350 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3479373.3333333335, ans=0.125 2023-11-26 16:46:01,233 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.05 vs. limit=15.0 2023-11-26 16:46:07,657 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 4900, loss[loss=0.06783, simple_loss=0.1013, pruned_loss=0.00979, audio_tagging_loss=0.007411, over 15227.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.08982, pruned_loss=0.01222, audio_tagging_loss=0.009022, over 3037991.08 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:46:08,678 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.702e+01 8.681e+01 9.501e+01 1.005e+02 1.327e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-26 16:46:08,876 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3479506.6666666665, ans=0.125 2023-11-26 16:46:32,677 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 521950 2023-11-26 16:46:50,484 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.37 vs. limit=15.0 2023-11-26 16:47:03,996 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 4950, loss[loss=0.04672, simple_loss=0.05502, pruned_loss=0.008618, audio_tagging_loss=0.01059, over 13937.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.08962, pruned_loss=0.01227, audio_tagging_loss=0.008845, over 3038972.91 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:47:27,914 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 522000 2023-11-26 16:48:00,041 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 5000, loss[loss=0.08505, simple_loss=0.116, pruned_loss=0.02199, audio_tagging_loss=0.005085, over 15175.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08915, pruned_loss=0.01215, audio_tagging_loss=0.008728, over 3037235.16 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:48:01,118 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.774e+01 8.862e+01 9.666e+01 1.035e+02 1.226e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-26 16:48:14,103 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3480240.0, ans=0.1 2023-11-26 16:48:25,373 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 522050 2023-11-26 16:48:40,115 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3480373.3333333335, ans=0.1 2023-11-26 16:48:42,095 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3480373.3333333335, ans=0.0 2023-11-26 16:48:55,718 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 5050, loss[loss=0.06704, simple_loss=0.0949, pruned_loss=0.01102, audio_tagging_loss=0.008571, over 15142.00 frames. ], tot_loss[loss=0.06505, simple_loss=0.08866, pruned_loss=0.01208, audio_tagging_loss=0.008637, over 3037885.98 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:48:56,014 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3480506.6666666665, ans=0.125 2023-11-26 16:48:59,665 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3480506.6666666665, ans=0.1 2023-11-26 16:49:09,250 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3480573.3333333335, ans=0.0 2023-11-26 16:49:09,390 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3480573.3333333335, ans=0.125 2023-11-26 16:49:10,394 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3480573.3333333335, ans=0.0 2023-11-26 16:49:21,062 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 522100 2023-11-26 16:49:42,859 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.29 vs. limit=12.0 2023-11-26 16:49:53,098 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 5100, loss[loss=0.05349, simple_loss=0.06782, pruned_loss=0.01045, audio_tagging_loss=0.009125, over 16147.00 frames. ], tot_loss[loss=0.06485, simple_loss=0.08838, pruned_loss=0.01207, audio_tagging_loss=0.008586, over 3034447.98 frames. ], batch size: 62, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:49:53,229 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3480840.0, ans=0.0 2023-11-26 16:49:54,126 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.064e+01 8.678e+01 9.277e+01 1.001e+02 1.240e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-26 16:50:17,099 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 522150 2023-11-26 16:50:21,572 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3480973.3333333335, ans=0.0 2023-11-26 16:50:26,509 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff2.min_abs, batch_count=3481040.0, ans=0.1 2023-11-26 16:50:29,125 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3481040.0, ans=0.0 2023-11-26 16:50:38,281 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3481106.6666666665, ans=0.125 2023-11-26 16:50:39,237 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3481106.6666666665, ans=0.125 2023-11-26 16:50:46,646 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3481106.6666666665, ans=0.1 2023-11-26 16:50:48,532 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 5150, loss[loss=0.03558, simple_loss=0.04592, pruned_loss=0.004133, audio_tagging_loss=0.008488, over 14110.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08923, pruned_loss=0.01223, audio_tagging_loss=0.008552, over 3039658.65 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:51:01,198 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.66 vs. limit=22.5 2023-11-26 16:51:14,174 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 522200 2023-11-26 16:51:26,295 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3481373.3333333335, ans=0.0 2023-11-26 16:51:28,465 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3481373.3333333335, ans=0.2 2023-11-26 16:51:44,734 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 5200, loss[loss=0.06829, simple_loss=0.09363, pruned_loss=0.0129, audio_tagging_loss=0.008573, over 16111.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.08899, pruned_loss=0.01216, audio_tagging_loss=0.00854, over 3042044.46 frames. ], batch size: 60, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:51:45,725 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.218e+01 8.774e+01 9.486e+01 1.034e+02 1.875e+02, threshold=1.897e+02, percent-clipped=1.0 2023-11-26 16:52:02,411 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3481573.3333333335, ans=0.0 2023-11-26 16:52:09,795 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 522250 2023-11-26 16:52:36,184 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.47 vs. limit=15.0 2023-11-26 16:52:37,370 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3481773.3333333335, ans=0.125 2023-11-26 16:52:41,982 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 5250, loss[loss=0.06728, simple_loss=0.09457, pruned_loss=0.01366, audio_tagging_loss=0.006334, over 14671.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08992, pruned_loss=0.01226, audio_tagging_loss=0.008449, over 3039831.18 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:52:56,078 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3481906.6666666665, ans=0.1 2023-11-26 16:53:01,379 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3481906.6666666665, ans=0.125 2023-11-26 16:53:05,973 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 522300 2023-11-26 16:53:17,612 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.52 vs. limit=15.0 2023-11-26 16:53:29,100 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3482106.6666666665, ans=0.0 2023-11-26 16:53:37,439 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 5300, loss[loss=0.06117, simple_loss=0.08201, pruned_loss=0.0109, audio_tagging_loss=0.009262, over 13937.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.09014, pruned_loss=0.01226, audio_tagging_loss=0.008479, over 3034392.36 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:53:39,544 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.063e+01 8.820e+01 9.463e+01 1.024e+02 1.274e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-26 16:53:45,047 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3482173.3333333335, ans=0.035 2023-11-26 16:54:02,618 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 522350 2023-11-26 16:54:16,043 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3482373.3333333335, ans=0.2 2023-11-26 16:54:24,535 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3482440.0, ans=0.125 2023-11-26 16:54:28,771 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3482440.0, ans=0.0 2023-11-26 16:54:33,247 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 5350, loss[loss=0.08021, simple_loss=0.1217, pruned_loss=0.01409, audio_tagging_loss=0.005273, over 15173.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.09023, pruned_loss=0.01226, audio_tagging_loss=0.008506, over 3038945.12 frames. ], batch size: 52, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:54:33,411 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3482506.6666666665, ans=0.0 2023-11-26 16:54:33,724 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.96 vs. limit=15.0 2023-11-26 16:54:35,568 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3482506.6666666665, ans=0.125 2023-11-26 16:54:53,296 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3482573.3333333335, ans=0.125 2023-11-26 16:54:58,401 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 522400 2023-11-26 16:55:30,687 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 5400, loss[loss=0.05279, simple_loss=0.07159, pruned_loss=0.00758, audio_tagging_loss=0.009416, over 16723.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.09045, pruned_loss=0.0124, audio_tagging_loss=0.008536, over 3044701.45 frames. ], batch size: 65, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:55:32,789 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.955e+01 9.129e+01 9.512e+01 1.019e+02 1.244e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-26 16:55:36,837 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.01 vs. limit=15.0 2023-11-26 16:55:50,081 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3482906.6666666665, ans=0.1 2023-11-26 16:55:51,150 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3482973.3333333335, ans=0.2 2023-11-26 16:55:54,156 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 522450 2023-11-26 16:56:09,236 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.46 vs. limit=12.0 2023-11-26 16:56:26,066 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 5450, loss[loss=0.06097, simple_loss=0.08262, pruned_loss=0.00965, audio_tagging_loss=0.01001, over 14008.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.09003, pruned_loss=0.01232, audio_tagging_loss=0.008584, over 3042093.32 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:56:50,420 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 522500 2023-11-26 16:56:50,564 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3483306.6666666665, ans=0.125 2023-11-26 16:56:53,516 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.08 vs. limit=22.5 2023-11-26 16:57:01,367 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3483373.3333333335, ans=0.125 2023-11-26 16:57:03,731 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.72 vs. limit=15.0 2023-11-26 16:57:05,531 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3483373.3333333335, ans=0.0 2023-11-26 16:57:21,247 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 5500, loss[loss=0.06208, simple_loss=0.08646, pruned_loss=0.01075, audio_tagging_loss=0.008106, over 14627.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.09018, pruned_loss=0.01228, audio_tagging_loss=0.008668, over 3036441.25 frames. ], batch size: 53, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:57:23,310 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.125e+01 8.753e+01 9.597e+01 1.033e+02 1.583e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-26 16:57:24,598 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 16:57:37,183 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.63 vs. limit=15.0 2023-11-26 16:57:39,382 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3483573.3333333335, ans=0.125 2023-11-26 16:57:46,841 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 522550 2023-11-26 16:58:03,063 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3483706.6666666665, ans=0.0 2023-11-26 16:58:05,157 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3483773.3333333335, ans=0.2 2023-11-26 16:58:18,145 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 5550, loss[loss=0.07746, simple_loss=0.1146, pruned_loss=0.01321, audio_tagging_loss=0.006937, over 16107.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.09072, pruned_loss=0.01241, audio_tagging_loss=0.008702, over 3038364.83 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:58:18,389 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3483840.0, ans=0.125 2023-11-26 16:58:19,482 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3483840.0, ans=0.125 2023-11-26 16:58:21,010 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3483840.0, ans=0.2 2023-11-26 16:58:23,132 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3483840.0, ans=10.0 2023-11-26 16:58:41,945 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 522600 2023-11-26 16:58:47,591 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3483973.3333333335, ans=0.125 2023-11-26 16:58:57,196 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3484040.0, ans=0.0 2023-11-26 16:59:03,573 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3484106.6666666665, ans=0.125 2023-11-26 16:59:08,810 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3484106.6666666665, ans=0.0 2023-11-26 16:59:13,966 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 5600, loss[loss=0.0534, simple_loss=0.06436, pruned_loss=0.01248, audio_tagging_loss=0.008739, over 15249.00 frames. ], tot_loss[loss=0.0667, simple_loss=0.09119, pruned_loss=0.01242, audio_tagging_loss=0.008683, over 3042068.89 frames. ], batch size: 59, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:59:16,060 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.541e+01 8.836e+01 9.428e+01 1.004e+02 1.214e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-26 16:59:16,389 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3484173.3333333335, ans=0.2 2023-11-26 16:59:20,387 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3484173.3333333335, ans=0.125 2023-11-26 16:59:20,437 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3484173.3333333335, ans=0.125 2023-11-26 16:59:34,217 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3484306.6666666665, ans=0.0 2023-11-26 16:59:38,569 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 522650 2023-11-26 16:59:53,789 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3484373.3333333335, ans=0.125 2023-11-26 16:59:56,686 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 16:59:57,983 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-26 16:59:58,031 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 17:00:09,440 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 5650, loss[loss=0.06305, simple_loss=0.08548, pruned_loss=0.009045, audio_tagging_loss=0.01127, over 15138.00 frames. ], tot_loss[loss=0.06664, simple_loss=0.09087, pruned_loss=0.01238, audio_tagging_loss=0.008833, over 3045034.63 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 17:00:34,601 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 522700 2023-11-26 17:00:46,467 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3484706.6666666665, ans=0.1 2023-11-26 17:00:59,616 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.92 vs. limit=15.0 2023-11-26 17:01:05,349 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 5700, loss[loss=0.0695, simple_loss=0.09372, pruned_loss=0.01427, audio_tagging_loss=0.008376, over 14569.00 frames. ], tot_loss[loss=0.06703, simple_loss=0.09126, pruned_loss=0.01257, audio_tagging_loss=0.008831, over 3042602.77 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 17:01:08,014 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.017e+01 9.014e+01 9.489e+01 1.005e+02 1.284e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-26 17:01:23,765 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3484906.6666666665, ans=0.125 2023-11-26 17:01:29,924 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 522750 2023-11-26 17:01:34,862 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.00 vs. limit=22.5 2023-11-26 17:01:46,712 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.51 vs. limit=10.0 2023-11-26 17:01:50,889 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3485106.6666666665, ans=0.0 2023-11-26 17:02:01,820 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 5750, loss[loss=0.06149, simple_loss=0.08061, pruned_loss=0.01136, audio_tagging_loss=0.009818, over 15535.00 frames. ], tot_loss[loss=0.06668, simple_loss=0.09081, pruned_loss=0.01257, audio_tagging_loss=0.008708, over 3034943.23 frames. ], batch size: 59, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 17:02:25,408 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 522800 2023-11-26 17:02:26,827 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3485306.6666666665, ans=0.1 2023-11-26 17:02:33,035 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.51 vs. limit=15.0 2023-11-26 17:02:57,294 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 5800, loss[loss=0.08549, simple_loss=0.1264, pruned_loss=0.01786, audio_tagging_loss=0.004436, over 15650.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.09003, pruned_loss=0.01243, audio_tagging_loss=0.008629, over 3033520.68 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 17:02:59,440 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.838e+01 8.825e+01 9.413e+01 1.036e+02 1.628e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-26 17:03:05,041 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3485506.6666666665, ans=0.1 2023-11-26 17:03:23,242 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 522850 2023-11-26 17:03:29,795 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3485640.0, ans=0.125 2023-11-26 17:03:49,807 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff2.min_abs, batch_count=3485773.3333333335, ans=0.1 2023-11-26 17:03:53,386 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 5850, loss[loss=0.057, simple_loss=0.06503, pruned_loss=0.01474, audio_tagging_loss=0.009742, over 14981.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.09107, pruned_loss=0.01262, audio_tagging_loss=0.008499, over 3033014.94 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:04:02,961 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3485840.0, ans=0.125 2023-11-26 17:04:13,875 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.60 vs. limit=22.5 2023-11-26 17:04:18,744 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 522900 2023-11-26 17:04:21,035 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3485973.3333333335, ans=0.0 2023-11-26 17:04:44,211 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.64 vs. limit=10.0 2023-11-26 17:04:44,291 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.85 vs. limit=12.0 2023-11-26 17:04:50,450 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 5900, loss[loss=0.05105, simple_loss=0.06317, pruned_loss=0.00681, audio_tagging_loss=0.01265, over 17050.00 frames. ], tot_loss[loss=0.06673, simple_loss=0.09128, pruned_loss=0.01258, audio_tagging_loss=0.008508, over 3039310.18 frames. ], batch size: 65, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:04:52,510 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.394e+01 8.809e+01 9.343e+01 1.010e+02 1.341e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-26 17:04:54,795 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3486173.3333333335, ans=0.125 2023-11-26 17:05:08,550 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=3486240.0, ans=10.0 2023-11-26 17:05:08,698 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3486240.0, ans=0.05 2023-11-26 17:05:09,622 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3486240.0, ans=0.0 2023-11-26 17:05:13,760 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 522950 2023-11-26 17:05:26,314 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.24 vs. limit=22.5 2023-11-26 17:05:27,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3486373.3333333335, ans=0.125 2023-11-26 17:05:29,663 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.41 vs. limit=15.0 2023-11-26 17:05:45,293 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 5950, loss[loss=0.06901, simple_loss=0.09689, pruned_loss=0.01161, audio_tagging_loss=0.008957, over 14668.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.09117, pruned_loss=0.01246, audio_tagging_loss=0.008503, over 3044995.96 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:05:53,154 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.03 vs. limit=15.0 2023-11-26 17:06:08,390 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3486640.0, ans=0.0 2023-11-26 17:06:10,245 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 523000 2023-11-26 17:06:40,781 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 6000, loss[loss=0.04411, simple_loss=0.05795, pruned_loss=0.007903, audio_tagging_loss=0.007228, over 15301.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.09046, pruned_loss=0.01232, audio_tagging_loss=0.008577, over 3042519.64 frames. ], batch size: 61, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:06:40,783 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-26 17:07:13,759 INFO [train_asr.py:1267] (0/4) Epoch 44, validation: loss=0.05792, simple_loss=0.05061, pruned_loss=0.005328, audio_tagging_loss=0.02728, over 4681554.00 frames. 2023-11-26 17:07:13,759 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-26 17:07:16,876 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.326e+01 8.949e+01 9.418e+01 1.019e+02 1.469e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-26 17:07:17,646 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.82 vs. limit=15.0 2023-11-26 17:07:25,535 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=3486906.6666666665, ans=0.05 2023-11-26 17:07:29,857 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 17:07:37,170 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 523050 2023-11-26 17:07:51,525 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3487040.0, ans=0.125 2023-11-26 17:07:56,222 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 17:07:56,469 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3487040.0, ans=0.125 2023-11-26 17:08:05,291 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.96 vs. limit=22.5 2023-11-26 17:08:08,306 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.63 vs. limit=15.0 2023-11-26 17:08:08,760 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 6050, loss[loss=0.07579, simple_loss=0.1117, pruned_loss=0.01175, audio_tagging_loss=0.00819, over 16385.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.0899, pruned_loss=0.01213, audio_tagging_loss=0.00857, over 3043616.03 frames. ], batch size: 61, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:08:22,071 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3487240.0, ans=0.125 2023-11-26 17:08:33,704 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 523100 2023-11-26 17:08:45,117 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3487373.3333333335, ans=0.125 2023-11-26 17:08:52,397 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3487440.0, ans=0.125 2023-11-26 17:09:04,400 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 6100, loss[loss=0.06787, simple_loss=0.09673, pruned_loss=0.01058, audio_tagging_loss=0.00893, over 14550.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.09062, pruned_loss=0.01247, audio_tagging_loss=0.008517, over 3044038.71 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:09:06,181 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.53 vs. limit=22.5 2023-11-26 17:09:08,050 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.524e+01 8.667e+01 9.163e+01 9.920e+01 1.251e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-26 17:09:25,482 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3487573.3333333335, ans=0.125 2023-11-26 17:09:29,471 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 523150 2023-11-26 17:09:29,656 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3487640.0, ans=0.125 2023-11-26 17:09:52,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3487773.3333333335, ans=0.125 2023-11-26 17:09:54,122 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3487773.3333333335, ans=0.125 2023-11-26 17:10:00,754 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 6150, loss[loss=0.06366, simple_loss=0.08643, pruned_loss=0.01147, audio_tagging_loss=0.00897, over 14902.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.09063, pruned_loss=0.01241, audio_tagging_loss=0.008546, over 3042053.42 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:10:09,048 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3487840.0, ans=0.125 2023-11-26 17:10:13,156 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3487906.6666666665, ans=0.0 2023-11-26 17:10:17,458 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3487906.6666666665, ans=0.0 2023-11-26 17:10:23,201 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.90 vs. limit=12.0 2023-11-26 17:10:24,727 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 523200 2023-11-26 17:10:26,240 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3487973.3333333335, ans=0.2 2023-11-26 17:10:29,953 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3487973.3333333335, ans=0.0 2023-11-26 17:10:42,658 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3488040.0, ans=0.125 2023-11-26 17:10:50,454 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3488106.6666666665, ans=0.125 2023-11-26 17:10:56,666 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 6200, loss[loss=0.06154, simple_loss=0.08147, pruned_loss=0.0106, audio_tagging_loss=0.0102, over 14701.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.08991, pruned_loss=0.01241, audio_tagging_loss=0.008669, over 3048723.81 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:10:56,814 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3488173.3333333335, ans=0.125 2023-11-26 17:10:59,880 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.581e+01 8.701e+01 9.346e+01 1.022e+02 1.320e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-26 17:11:03,322 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3488173.3333333335, ans=0.2 2023-11-26 17:11:05,413 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3488173.3333333335, ans=0.2 2023-11-26 17:11:22,131 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 523250 2023-11-26 17:11:28,577 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3488306.6666666665, ans=0.125 2023-11-26 17:11:32,743 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3488373.3333333335, ans=0.0 2023-11-26 17:11:34,423 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3488373.3333333335, ans=0.125 2023-11-26 17:11:45,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3488440.0, ans=0.125 2023-11-26 17:11:51,562 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.21 vs. limit=22.5 2023-11-26 17:11:51,720 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.77 vs. limit=10.0 2023-11-26 17:11:52,831 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 6250, loss[loss=0.07064, simple_loss=0.09982, pruned_loss=0.01383, audio_tagging_loss=0.006895, over 14740.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.08972, pruned_loss=0.01224, audio_tagging_loss=0.008732, over 3049890.64 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:12:00,065 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3488506.6666666665, ans=0.0 2023-11-26 17:12:04,784 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3488573.3333333335, ans=0.125 2023-11-26 17:12:17,958 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 523300 2023-11-26 17:12:40,328 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3488773.3333333335, ans=0.125 2023-11-26 17:12:44,622 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3488773.3333333335, ans=0.125 2023-11-26 17:12:49,234 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 6300, loss[loss=0.09631, simple_loss=0.1422, pruned_loss=0.01953, audio_tagging_loss=0.005677, over 15947.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.09091, pruned_loss=0.01243, audio_tagging_loss=0.008691, over 3047347.18 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:12:52,401 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.161e+01 8.680e+01 9.348e+01 1.004e+02 1.184e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-26 17:13:06,978 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3488906.6666666665, ans=0.125 2023-11-26 17:13:13,202 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 523350 2023-11-26 17:13:24,243 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.21 vs. limit=12.0 2023-11-26 17:13:44,470 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 6350, loss[loss=0.07707, simple_loss=0.1153, pruned_loss=0.008917, audio_tagging_loss=0.0105, over 14598.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.08986, pruned_loss=0.01206, audio_tagging_loss=0.00891, over 3040831.22 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:13:48,349 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3489173.3333333335, ans=0.1 2023-11-26 17:13:51,098 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.25 vs. limit=10.0 2023-11-26 17:13:52,521 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3489173.3333333335, ans=0.2 2023-11-26 17:14:09,534 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 523400 2023-11-26 17:14:19,101 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3489373.3333333335, ans=0.0 2023-11-26 17:14:40,440 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 6400, loss[loss=0.0744, simple_loss=0.1018, pruned_loss=0.0121, audio_tagging_loss=0.01138, over 14948.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08933, pruned_loss=0.01198, audio_tagging_loss=0.008987, over 3038348.98 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:14:45,172 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.093e+01 8.723e+01 9.338e+01 1.021e+02 1.186e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-26 17:14:59,313 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3489573.3333333335, ans=0.125 2023-11-26 17:15:01,533 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3489573.3333333335, ans=0.09899494936611666 2023-11-26 17:15:02,098 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.25 vs. limit=22.5 2023-11-26 17:15:05,626 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 523450 2023-11-26 17:15:11,164 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-26 17:15:18,075 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3489706.6666666665, ans=0.09899494936611666 2023-11-26 17:15:18,124 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3489706.6666666665, ans=0.0 2023-11-26 17:15:28,636 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.64 vs. limit=15.0 2023-11-26 17:15:37,384 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 6450, loss[loss=0.05204, simple_loss=0.0687, pruned_loss=0.008057, audio_tagging_loss=0.009631, over 14687.00 frames. ], tot_loss[loss=0.06499, simple_loss=0.08815, pruned_loss=0.01183, audio_tagging_loss=0.00909, over 3035173.79 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:15:42,880 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3489840.0, ans=0.2 2023-11-26 17:15:48,171 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3489906.6666666665, ans=0.125 2023-11-26 17:15:49,563 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.90 vs. limit=15.0 2023-11-26 17:15:58,391 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3489973.3333333335, ans=0.125 2023-11-26 17:16:01,351 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 523500 2023-11-26 17:16:01,503 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3489973.3333333335, ans=0.0 2023-11-26 17:16:20,074 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3490040.0, ans=0.0 2023-11-26 17:16:26,433 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3490106.6666666665, ans=0.125 2023-11-26 17:16:31,248 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.55 vs. limit=15.0 2023-11-26 17:16:31,764 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3490173.3333333335, ans=0.125 2023-11-26 17:16:32,621 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 6500, loss[loss=0.07489, simple_loss=0.103, pruned_loss=0.01775, audio_tagging_loss=0.00562, over 16173.00 frames. ], tot_loss[loss=0.06505, simple_loss=0.08805, pruned_loss=0.01203, audio_tagging_loss=0.008999, over 3041976.45 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:16:37,905 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.378e+01 8.749e+01 9.498e+01 1.005e+02 1.590e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-26 17:16:57,456 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 523550 2023-11-26 17:17:12,217 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3490373.3333333335, ans=0.0 2023-11-26 17:17:28,345 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 6550, loss[loss=0.07144, simple_loss=0.1003, pruned_loss=0.01196, audio_tagging_loss=0.009326, over 15084.00 frames. ], tot_loss[loss=0.06501, simple_loss=0.08822, pruned_loss=0.01202, audio_tagging_loss=0.008879, over 3051827.67 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:17:38,119 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3490506.6666666665, ans=0.125 2023-11-26 17:17:53,225 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 523600 2023-11-26 17:18:09,069 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3490706.6666666665, ans=0.125 2023-11-26 17:18:09,374 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.21 vs. limit=15.0 2023-11-26 17:18:24,779 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 6600, loss[loss=0.08138, simple_loss=0.1198, pruned_loss=0.01539, audio_tagging_loss=0.006104, over 15222.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08868, pruned_loss=0.01207, audio_tagging_loss=0.008769, over 3048072.55 frames. ], batch size: 53, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:18:30,615 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.701e+01 8.733e+01 9.478e+01 1.039e+02 1.396e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-26 17:18:30,907 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3490840.0, ans=0.125 2023-11-26 17:18:37,515 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.58 vs. limit=10.0 2023-11-26 17:18:48,759 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 523650 2023-11-26 17:18:48,947 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3490973.3333333335, ans=0.125 2023-11-26 17:19:14,502 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3491106.6666666665, ans=0.2 2023-11-26 17:19:20,545 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 6650, loss[loss=0.09008, simple_loss=0.1225, pruned_loss=0.01847, audio_tagging_loss=0.01035, over 15053.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.08975, pruned_loss=0.01215, audio_tagging_loss=0.008707, over 3045736.13 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:19:38,350 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3491240.0, ans=0.125 2023-11-26 17:19:45,639 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 523700 2023-11-26 17:19:48,938 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3491306.6666666665, ans=0.125 2023-11-26 17:19:51,151 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3491306.6666666665, ans=0.0 2023-11-26 17:19:53,067 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.82 vs. limit=15.0 2023-11-26 17:20:00,071 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3491373.3333333335, ans=0.015 2023-11-26 17:20:12,078 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.61 vs. limit=15.0 2023-11-26 17:20:15,804 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 6700, loss[loss=0.07358, simple_loss=0.1062, pruned_loss=0.01218, audio_tagging_loss=0.008297, over 15182.00 frames. ], tot_loss[loss=0.06645, simple_loss=0.0906, pruned_loss=0.01244, audio_tagging_loss=0.008708, over 3046092.77 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:20:21,688 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.319e+01 8.660e+01 9.419e+01 1.025e+02 1.437e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-26 17:20:21,848 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3491506.6666666665, ans=0.125 2023-11-26 17:20:23,315 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.01 vs. limit=15.0 2023-11-26 17:20:29,296 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3491573.3333333335, ans=0.1 2023-11-26 17:20:41,455 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 523750 2023-11-26 17:20:43,736 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3491640.0, ans=0.125 2023-11-26 17:20:46,903 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3491640.0, ans=0.0 2023-11-26 17:21:12,553 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 6750, loss[loss=0.05245, simple_loss=0.066, pruned_loss=0.009179, audio_tagging_loss=0.01027, over 15401.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.09036, pruned_loss=0.01237, audio_tagging_loss=0.008694, over 3044665.90 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:21:19,864 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3491840.0, ans=0.1 2023-11-26 17:21:25,261 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3491906.6666666665, ans=0.125 2023-11-26 17:21:36,694 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 523800 2023-11-26 17:21:38,231 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3491973.3333333335, ans=0.125 2023-11-26 17:21:39,301 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3491973.3333333335, ans=0.0 2023-11-26 17:22:08,687 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 6800, loss[loss=0.04936, simple_loss=0.05957, pruned_loss=0.009921, audio_tagging_loss=0.009651, over 16728.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.09007, pruned_loss=0.01247, audio_tagging_loss=0.008641, over 3044236.97 frames. ], batch size: 64, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:22:13,936 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.758e+01 8.963e+01 9.408e+01 1.006e+02 1.345e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-26 17:22:31,709 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3492306.6666666665, ans=0.125 2023-11-26 17:22:33,156 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 523850 2023-11-26 17:22:41,306 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3492373.3333333335, ans=0.125 2023-11-26 17:22:57,426 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3492440.0, ans=0.125 2023-11-26 17:22:58,393 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3492440.0, ans=0.125 2023-11-26 17:23:00,494 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3492440.0, ans=0.125 2023-11-26 17:23:03,564 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 6850, loss[loss=0.0702, simple_loss=0.0917, pruned_loss=0.01316, audio_tagging_loss=0.01119, over 15570.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.08986, pruned_loss=0.0124, audio_tagging_loss=0.008645, over 3037537.26 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:23:14,726 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3492573.3333333335, ans=0.0 2023-11-26 17:23:14,854 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3492573.3333333335, ans=0.125 2023-11-26 17:23:17,095 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3492573.3333333335, ans=0.125 2023-11-26 17:23:29,033 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 523900 2023-11-26 17:23:32,699 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=3492640.0, ans=15.0 2023-11-26 17:23:40,836 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3492706.6666666665, ans=0.2 2023-11-26 17:23:48,367 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3492773.3333333335, ans=0.0 2023-11-26 17:23:59,738 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 6900, loss[loss=0.0588, simple_loss=0.06997, pruned_loss=0.01352, audio_tagging_loss=0.0103, over 14714.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.08945, pruned_loss=0.01236, audio_tagging_loss=0.008729, over 3038591.49 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:24:07,189 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.973e+01 8.659e+01 9.332e+01 1.017e+02 1.232e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-26 17:24:24,273 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 523950 2023-11-26 17:24:28,671 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3492973.3333333335, ans=0.5 2023-11-26 17:24:39,658 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3493040.0, ans=0.125 2023-11-26 17:24:42,349 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3493040.0, ans=0.1 2023-11-26 17:24:44,932 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 17:24:49,741 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3493106.6666666665, ans=0.0 2023-11-26 17:24:53,074 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3493106.6666666665, ans=0.125 2023-11-26 17:24:56,071 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 6950, loss[loss=0.05756, simple_loss=0.07744, pruned_loss=0.01152, audio_tagging_loss=0.007322, over 15856.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.08914, pruned_loss=0.01232, audio_tagging_loss=0.00872, over 3030638.09 frames. ], batch size: 62, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:24:57,385 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3493173.3333333335, ans=0.125 2023-11-26 17:25:07,993 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3493240.0, ans=0.0 2023-11-26 17:25:19,486 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 524000 2023-11-26 17:25:21,381 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-524000.pt 2023-11-26 17:25:28,896 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3493306.6666666665, ans=0.0 2023-11-26 17:25:34,724 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 17:25:50,789 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3493440.0, ans=0.0 2023-11-26 17:25:51,807 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3493440.0, ans=0.04949747468305833 2023-11-26 17:25:53,707 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 7000, loss[loss=0.06746, simple_loss=0.09354, pruned_loss=0.01074, audio_tagging_loss=0.009952, over 14606.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.08962, pruned_loss=0.01246, audio_tagging_loss=0.008758, over 3034043.72 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:26:00,102 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.638e+01 8.888e+01 9.554e+01 1.025e+02 1.624e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-26 17:26:12,632 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3493573.3333333335, ans=0.125 2023-11-26 17:26:13,585 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3493573.3333333335, ans=0.125 2023-11-26 17:26:19,192 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 524050 2023-11-26 17:26:27,747 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3493706.6666666665, ans=0.125 2023-11-26 17:26:36,142 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3493706.6666666665, ans=0.0 2023-11-26 17:26:49,084 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 7050, loss[loss=0.05377, simple_loss=0.06984, pruned_loss=0.01003, audio_tagging_loss=0.008826, over 15796.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.09029, pruned_loss=0.01242, audio_tagging_loss=0.008676, over 3041131.43 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:26:57,060 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.10 vs. limit=6.0 2023-11-26 17:27:02,002 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.81 vs. limit=15.0 2023-11-26 17:27:11,187 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3493973.3333333335, ans=0.0 2023-11-26 17:27:12,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3493973.3333333335, ans=0.125 2023-11-26 17:27:14,139 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 524100 2023-11-26 17:27:18,100 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.03 vs. limit=15.0 2023-11-26 17:27:46,079 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 7100, loss[loss=0.05744, simple_loss=0.0792, pruned_loss=0.009486, audio_tagging_loss=0.008354, over 17031.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.08973, pruned_loss=0.01229, audio_tagging_loss=0.008745, over 3052229.23 frames. ], batch size: 64, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:27:52,378 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.497e+01 8.801e+01 9.665e+01 1.022e+02 1.314e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-26 17:28:05,280 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3494240.0, ans=0.125 2023-11-26 17:28:09,411 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 524150 2023-11-26 17:28:10,606 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3494306.6666666665, ans=0.2 2023-11-26 17:28:26,469 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.71 vs. limit=15.0 2023-11-26 17:28:26,929 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 17:28:34,356 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3494440.0, ans=0.1 2023-11-26 17:28:38,684 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3494440.0, ans=0.125 2023-11-26 17:28:40,583 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 7150, loss[loss=0.06235, simple_loss=0.08761, pruned_loss=0.01209, audio_tagging_loss=0.006451, over 17971.00 frames. ], tot_loss[loss=0.0663, simple_loss=0.09005, pruned_loss=0.01248, audio_tagging_loss=0.008796, over 3054456.45 frames. ], batch size: 69, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:29:05,566 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 524200 2023-11-26 17:29:05,698 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3494640.0, ans=0.125 2023-11-26 17:29:18,337 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=3494706.6666666665, ans=0.02 2023-11-26 17:29:19,421 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 17:29:32,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3494773.3333333335, ans=0.125 2023-11-26 17:29:34,261 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=3494773.3333333335, ans=0.025 2023-11-26 17:29:36,127 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 7200, loss[loss=0.04268, simple_loss=0.05295, pruned_loss=0.007057, audio_tagging_loss=0.009145, over 14341.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.08938, pruned_loss=0.01229, audio_tagging_loss=0.008932, over 3047703.18 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:29:43,587 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.106e+01 8.953e+01 9.579e+01 1.038e+02 1.531e+02, threshold=1.916e+02, percent-clipped=0.0 2023-11-26 17:29:48,857 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.28 vs. limit=6.0 2023-11-26 17:29:49,594 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3494906.6666666665, ans=0.125 2023-11-26 17:29:55,879 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.40 vs. limit=15.0 2023-11-26 17:30:01,792 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 524250 2023-11-26 17:30:32,731 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 7250, loss[loss=0.07311, simple_loss=0.106, pruned_loss=0.01314, audio_tagging_loss=0.006983, over 16098.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.0898, pruned_loss=0.0124, audio_tagging_loss=0.008838, over 3052980.00 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:30:53,757 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3495306.6666666665, ans=0.0 2023-11-26 17:30:56,618 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 524300 2023-11-26 17:31:18,468 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3495440.0, ans=0.2 2023-11-26 17:31:25,198 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3495440.0, ans=0.1 2023-11-26 17:31:28,172 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 7300, loss[loss=0.05687, simple_loss=0.08344, pruned_loss=0.008397, audio_tagging_loss=0.006755, over 16087.00 frames. ], tot_loss[loss=0.06596, simple_loss=0.08952, pruned_loss=0.01233, audio_tagging_loss=0.008867, over 3042730.94 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:31:35,616 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.194e+01 8.863e+01 9.398e+01 1.006e+02 1.192e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-26 17:31:52,479 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 524350 2023-11-26 17:31:58,869 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.76 vs. limit=12.0 2023-11-26 17:32:23,233 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 7350, loss[loss=0.08044, simple_loss=0.1231, pruned_loss=0.01318, audio_tagging_loss=0.005733, over 16826.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.08976, pruned_loss=0.01225, audio_tagging_loss=0.00872, over 3044978.48 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:32:33,975 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3495906.6666666665, ans=0.2 2023-11-26 17:32:48,578 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 524400 2023-11-26 17:32:51,061 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3495973.3333333335, ans=0.125 2023-11-26 17:32:52,642 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.26 vs. limit=15.0 2023-11-26 17:32:55,419 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3495973.3333333335, ans=0.125 2023-11-26 17:32:56,566 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3496040.0, ans=0.2 2023-11-26 17:33:00,927 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3496040.0, ans=0.0 2023-11-26 17:33:20,293 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 7400, loss[loss=0.06016, simple_loss=0.08847, pruned_loss=0.009851, audio_tagging_loss=0.006074, over 14722.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.09012, pruned_loss=0.01224, audio_tagging_loss=0.008613, over 3047421.88 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:33:27,564 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.789e+01 8.991e+01 9.600e+01 1.026e+02 1.969e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-26 17:33:33,749 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3496240.0, ans=0.125 2023-11-26 17:33:44,197 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 524450 2023-11-26 17:34:15,327 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 7450, loss[loss=0.03947, simple_loss=0.04208, pruned_loss=0.007814, audio_tagging_loss=0.01061, over 13990.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08973, pruned_loss=0.01218, audio_tagging_loss=0.008589, over 3040001.02 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:34:19,194 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3496506.6666666665, ans=0.125 2023-11-26 17:34:40,335 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 524500 2023-11-26 17:34:48,597 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3496706.6666666665, ans=0.125 2023-11-26 17:34:50,617 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3496706.6666666665, ans=0.125 2023-11-26 17:34:54,769 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3496706.6666666665, ans=0.125 2023-11-26 17:35:06,963 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3496773.3333333335, ans=0.0 2023-11-26 17:35:07,261 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.66 vs. limit=15.0 2023-11-26 17:35:09,194 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 17:35:11,156 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 7500, loss[loss=0.04726, simple_loss=0.06436, pruned_loss=0.006548, audio_tagging_loss=0.008536, over 14812.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.08943, pruned_loss=0.0121, audio_tagging_loss=0.008572, over 3044592.48 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:35:19,591 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.260e+01 8.894e+01 9.302e+01 1.027e+02 1.348e+02, threshold=1.860e+02, percent-clipped=1.0 2023-11-26 17:35:29,023 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3496906.6666666665, ans=0.125 2023-11-26 17:35:36,263 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 524550 2023-11-26 17:35:52,751 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3497040.0, ans=0.2 2023-11-26 17:36:02,228 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3497106.6666666665, ans=0.0 2023-11-26 17:36:07,362 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 7550, loss[loss=0.07185, simple_loss=0.1037, pruned_loss=0.01242, audio_tagging_loss=0.007574, over 16076.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08909, pruned_loss=0.0121, audio_tagging_loss=0.008611, over 3048322.47 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:36:11,376 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3497173.3333333335, ans=0.1 2023-11-26 17:36:13,469 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3497173.3333333335, ans=0.0 2023-11-26 17:36:20,894 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3497240.0, ans=0.125 2023-11-26 17:36:21,024 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3497240.0, ans=0.95 2023-11-26 17:36:26,955 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3497240.0, ans=0.95 2023-11-26 17:36:32,054 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 524600 2023-11-26 17:36:48,276 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3497373.3333333335, ans=0.125 2023-11-26 17:36:53,062 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3497440.0, ans=0.2 2023-11-26 17:36:57,180 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3497440.0, ans=0.2 2023-11-26 17:36:58,323 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3497440.0, ans=0.07 2023-11-26 17:37:00,339 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3497440.0, ans=0.125 2023-11-26 17:37:03,389 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 7600, loss[loss=0.06122, simple_loss=0.08543, pruned_loss=0.009869, audio_tagging_loss=0.008637, over 15808.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.08882, pruned_loss=0.01222, audio_tagging_loss=0.008577, over 3046191.52 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:37:03,598 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3497506.6666666665, ans=0.0 2023-11-26 17:37:05,693 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3497506.6666666665, ans=0.0 2023-11-26 17:37:10,767 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.143e+01 8.779e+01 9.690e+01 1.052e+02 1.195e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-26 17:37:15,025 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.19 vs. limit=15.0 2023-11-26 17:37:27,973 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 524650 2023-11-26 17:37:43,580 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3497706.6666666665, ans=0.1 2023-11-26 17:37:44,717 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3497706.6666666665, ans=0.125 2023-11-26 17:37:58,204 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.49 vs. limit=15.0 2023-11-26 17:37:58,861 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 7650, loss[loss=0.07108, simple_loss=0.1088, pruned_loss=0.00913, audio_tagging_loss=0.007542, over 15759.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.0896, pruned_loss=0.01225, audio_tagging_loss=0.00852, over 3044112.97 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:38:09,120 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3497906.6666666665, ans=0.025 2023-11-26 17:38:15,447 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3497906.6666666665, ans=0.0 2023-11-26 17:38:20,367 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3497973.3333333335, ans=0.125 2023-11-26 17:38:23,439 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 524700 2023-11-26 17:38:46,152 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3498106.6666666665, ans=0.125 2023-11-26 17:38:51,923 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.99 vs. limit=15.0 2023-11-26 17:38:54,585 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 7700, loss[loss=0.04904, simple_loss=0.06847, pruned_loss=0.005387, audio_tagging_loss=0.00942, over 14810.00 frames. ], tot_loss[loss=0.06493, simple_loss=0.08852, pruned_loss=0.01209, audio_tagging_loss=0.008579, over 3039550.89 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:38:54,875 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3498173.3333333335, ans=0.1 2023-11-26 17:38:59,745 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3498173.3333333335, ans=0.125 2023-11-26 17:39:02,510 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.452e+01 9.021e+01 9.589e+01 1.019e+02 1.364e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-26 17:39:06,074 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3498240.0, ans=0.125 2023-11-26 17:39:18,500 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 524750 2023-11-26 17:39:21,828 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3498306.6666666665, ans=0.1 2023-11-26 17:39:25,713 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3498306.6666666665, ans=0.125 2023-11-26 17:39:28,978 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3498373.3333333335, ans=0.125 2023-11-26 17:39:45,561 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3498440.0, ans=0.125 2023-11-26 17:39:50,743 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 7750, loss[loss=0.07336, simple_loss=0.09607, pruned_loss=0.01791, audio_tagging_loss=0.007422, over 14303.00 frames. ], tot_loss[loss=0.0651, simple_loss=0.08851, pruned_loss=0.01215, audio_tagging_loss=0.008693, over 3030900.77 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:39:59,877 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.15 vs. limit=15.0 2023-11-26 17:40:00,530 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3498573.3333333335, ans=0.125 2023-11-26 17:40:05,801 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3498573.3333333335, ans=0.125 2023-11-26 17:40:10,481 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3498573.3333333335, ans=0.125 2023-11-26 17:40:13,851 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.02 vs. limit=15.0 2023-11-26 17:40:14,461 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 524800 2023-11-26 17:40:23,047 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3498706.6666666665, ans=0.035 2023-11-26 17:40:26,889 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3498706.6666666665, ans=0.125 2023-11-26 17:40:45,533 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 7800, loss[loss=0.05943, simple_loss=0.08433, pruned_loss=0.009603, audio_tagging_loss=0.00766, over 15433.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08927, pruned_loss=0.01233, audio_tagging_loss=0.008687, over 3028573.73 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:40:46,808 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3498840.0, ans=0.0 2023-11-26 17:40:54,548 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.572e+01 8.862e+01 9.365e+01 1.011e+02 1.202e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-26 17:41:10,249 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3498973.3333333335, ans=0.125 2023-11-26 17:41:10,772 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.18 vs. limit=15.0 2023-11-26 17:41:11,071 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 524850 2023-11-26 17:41:25,005 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3499040.0, ans=0.0 2023-11-26 17:41:34,071 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.41 vs. limit=15.0 2023-11-26 17:41:35,168 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3499106.6666666665, ans=0.0 2023-11-26 17:41:39,853 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.03 vs. limit=10.0 2023-11-26 17:41:41,884 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 7850, loss[loss=0.04863, simple_loss=0.05359, pruned_loss=0.01085, audio_tagging_loss=0.01099, over 15296.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.0889, pruned_loss=0.01223, audio_tagging_loss=0.008772, over 3026758.44 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:41:43,785 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3499173.3333333335, ans=0.125 2023-11-26 17:41:45,051 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.82 vs. limit=15.0 2023-11-26 17:41:51,802 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.82 vs. limit=15.0 2023-11-26 17:42:06,279 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 524900 2023-11-26 17:42:08,982 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.51 vs. limit=12.0 2023-11-26 17:42:17,000 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3499373.3333333335, ans=0.0 2023-11-26 17:42:26,370 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.86 vs. limit=15.0 2023-11-26 17:42:35,062 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3499440.0, ans=0.0 2023-11-26 17:42:35,130 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3499440.0, ans=0.125 2023-11-26 17:42:38,165 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 7900, loss[loss=0.08215, simple_loss=0.1235, pruned_loss=0.01301, audio_tagging_loss=0.007415, over 16464.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.08923, pruned_loss=0.01226, audio_tagging_loss=0.00884, over 3035612.42 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:42:46,459 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.148e+01 9.062e+01 9.709e+01 1.039e+02 1.444e+02, threshold=1.942e+02, percent-clipped=0.0 2023-11-26 17:42:48,750 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3499573.3333333335, ans=0.1 2023-11-26 17:42:53,312 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.28 vs. limit=15.0 2023-11-26 17:42:56,139 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3499573.3333333335, ans=0.1 2023-11-26 17:42:56,276 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3499573.3333333335, ans=0.125 2023-11-26 17:43:01,443 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 524950 2023-11-26 17:43:02,251 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3499640.0, ans=0.0 2023-11-26 17:43:24,143 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3499773.3333333335, ans=0.0 2023-11-26 17:43:29,463 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3499773.3333333335, ans=0.125 2023-11-26 17:43:33,419 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 7950, loss[loss=0.0596, simple_loss=0.08463, pruned_loss=0.008353, audio_tagging_loss=0.008932, over 14869.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.08923, pruned_loss=0.01223, audio_tagging_loss=0.008964, over 3031973.43 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:43:37,375 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.07 vs. limit=15.0 2023-11-26 17:43:42,346 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3499840.0, ans=0.125 2023-11-26 17:43:49,739 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3499906.6666666665, ans=0.125 2023-11-26 17:43:50,579 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 17:43:59,152 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 525000 2023-11-26 17:43:59,489 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.93 vs. limit=6.0 2023-11-26 17:44:00,597 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3499973.3333333335, ans=0.05 2023-11-26 17:44:13,730 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.25 vs. limit=10.0 2023-11-26 17:44:15,447 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3500040.0, ans=0.2 2023-11-26 17:44:28,937 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 8000, loss[loss=0.08995, simple_loss=0.1311, pruned_loss=0.01696, audio_tagging_loss=0.007451, over 16165.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08867, pruned_loss=0.01206, audio_tagging_loss=0.009089, over 3035178.73 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:44:33,367 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.16 vs. limit=15.0 2023-11-26 17:44:38,611 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.125e+01 8.982e+01 9.655e+01 1.047e+02 1.497e+02, threshold=1.931e+02, percent-clipped=0.0 2023-11-26 17:44:54,202 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 525050 2023-11-26 17:45:03,810 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3500373.3333333335, ans=0.125 2023-11-26 17:45:21,413 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3500440.0, ans=0.125 2023-11-26 17:45:23,974 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3500440.0, ans=0.0 2023-11-26 17:45:25,876 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 8050, loss[loss=0.07028, simple_loss=0.09248, pruned_loss=0.01278, audio_tagging_loss=0.01126, over 15281.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.08865, pruned_loss=0.01209, audio_tagging_loss=0.009183, over 3036387.55 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:45:30,348 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3500506.6666666665, ans=0.125 2023-11-26 17:45:46,196 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3500640.0, ans=0.125 2023-11-26 17:45:49,201 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 525100 2023-11-26 17:46:14,057 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3500773.3333333335, ans=0.125 2023-11-26 17:46:21,176 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 8100, loss[loss=0.06365, simple_loss=0.08027, pruned_loss=0.0134, audio_tagging_loss=0.01012, over 14927.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.0886, pruned_loss=0.01219, audio_tagging_loss=0.009088, over 3034680.79 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:46:25,142 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.66 vs. limit=10.0 2023-11-26 17:46:29,604 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.030e+01 8.964e+01 9.424e+01 1.017e+02 1.199e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-26 17:46:40,836 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.40 vs. limit=22.5 2023-11-26 17:46:42,647 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3500973.3333333335, ans=0.125 2023-11-26 17:46:45,705 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 525150 2023-11-26 17:46:52,709 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3500973.3333333335, ans=0.125 2023-11-26 17:46:57,101 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3501040.0, ans=0.125 2023-11-26 17:47:07,905 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3501106.6666666665, ans=0.1 2023-11-26 17:47:16,041 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 8150, loss[loss=0.05642, simple_loss=0.07757, pruned_loss=0.007807, audio_tagging_loss=0.00983, over 14741.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.08835, pruned_loss=0.01198, audio_tagging_loss=0.008925, over 3036603.48 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:47:18,346 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3501173.3333333335, ans=0.0 2023-11-26 17:47:28,961 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.52 vs. limit=22.5 2023-11-26 17:47:33,323 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3501240.0, ans=0.0 2023-11-26 17:47:34,237 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3501240.0, ans=0.2 2023-11-26 17:47:37,472 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3501240.0, ans=0.1 2023-11-26 17:47:41,622 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 525200 2023-11-26 17:47:45,135 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3501306.6666666665, ans=0.0 2023-11-26 17:48:08,760 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3501440.0, ans=0.2 2023-11-26 17:48:13,421 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 8200, loss[loss=0.05133, simple_loss=0.06834, pruned_loss=0.007795, audio_tagging_loss=0.00937, over 16494.00 frames. ], tot_loss[loss=0.06511, simple_loss=0.08866, pruned_loss=0.01199, audio_tagging_loss=0.008786, over 3035954.10 frames. ], batch size: 62, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:48:16,603 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 17:48:17,946 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3501506.6666666665, ans=0.125 2023-11-26 17:48:22,946 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.532e+01 8.766e+01 9.406e+01 1.001e+02 1.239e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-26 17:48:36,836 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 525250 2023-11-26 17:48:55,321 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3501706.6666666665, ans=0.0 2023-11-26 17:48:56,348 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3501706.6666666665, ans=0.0 2023-11-26 17:49:08,676 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 8250, loss[loss=0.07899, simple_loss=0.1154, pruned_loss=0.01321, audio_tagging_loss=0.008059, over 15674.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.0892, pruned_loss=0.01204, audio_tagging_loss=0.008663, over 3041347.31 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:49:20,514 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3501906.6666666665, ans=0.125 2023-11-26 17:49:22,016 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.46 vs. limit=15.0 2023-11-26 17:49:25,759 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3501906.6666666665, ans=0.0 2023-11-26 17:49:29,911 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.25 vs. limit=15.0 2023-11-26 17:49:33,200 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 525300 2023-11-26 17:49:35,986 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3501973.3333333335, ans=0.0 2023-11-26 17:50:03,796 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 8300, loss[loss=0.06147, simple_loss=0.07688, pruned_loss=0.01296, audio_tagging_loss=0.01007, over 14798.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08895, pruned_loss=0.01206, audio_tagging_loss=0.008723, over 3042833.68 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:50:04,061 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3502173.3333333335, ans=0.125 2023-11-26 17:50:05,162 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3502173.3333333335, ans=0.1 2023-11-26 17:50:14,252 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.129e+01 8.963e+01 9.594e+01 1.033e+02 1.265e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-26 17:50:28,595 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 525350 2023-11-26 17:50:46,949 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3502440.0, ans=0.125 2023-11-26 17:50:58,680 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3502506.6666666665, ans=0.125 2023-11-26 17:50:59,498 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 8350, loss[loss=0.06215, simple_loss=0.09061, pruned_loss=0.007242, audio_tagging_loss=0.009605, over 15862.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08951, pruned_loss=0.01214, audio_tagging_loss=0.008693, over 3044585.44 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:50:59,676 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3502506.6666666665, ans=0.0 2023-11-26 17:51:06,240 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3502506.6666666665, ans=0.0 2023-11-26 17:51:23,990 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 525400 2023-11-26 17:51:32,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3502706.6666666665, ans=0.125 2023-11-26 17:51:36,898 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.49 vs. limit=22.5 2023-11-26 17:51:48,857 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3502773.3333333335, ans=0.04949747468305833 2023-11-26 17:51:56,009 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 8400, loss[loss=0.0459, simple_loss=0.05391, pruned_loss=0.007648, audio_tagging_loss=0.01129, over 14759.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08925, pruned_loss=0.01199, audio_tagging_loss=0.008674, over 3049523.88 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:51:59,415 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-26 17:52:05,392 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.713e+01 8.821e+01 9.539e+01 1.045e+02 1.757e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-26 17:52:19,645 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 525450 2023-11-26 17:52:33,148 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3503040.0, ans=0.125 2023-11-26 17:52:41,695 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.44 vs. limit=15.0 2023-11-26 17:52:41,744 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.14 vs. limit=6.0 2023-11-26 17:52:50,801 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 8450, loss[loss=0.05412, simple_loss=0.0724, pruned_loss=0.008638, audio_tagging_loss=0.009284, over 16026.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08898, pruned_loss=0.01195, audio_tagging_loss=0.008673, over 3051012.26 frames. ], batch size: 63, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:52:53,066 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3503173.3333333335, ans=0.0 2023-11-26 17:53:16,138 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 525500 2023-11-26 17:53:35,663 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.37 vs. limit=15.0 2023-11-26 17:53:46,681 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 8500, loss[loss=0.09006, simple_loss=0.1266, pruned_loss=0.02088, audio_tagging_loss=0.005866, over 15481.00 frames. ], tot_loss[loss=0.06453, simple_loss=0.08812, pruned_loss=0.0117, audio_tagging_loss=0.008766, over 3049492.98 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:53:54,044 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.70 vs. limit=15.0 2023-11-26 17:53:58,147 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.806e+01 8.914e+01 9.480e+01 1.019e+02 1.218e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-26 17:54:03,866 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.00 vs. limit=15.0 2023-11-26 17:54:10,869 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 525550 2023-11-26 17:54:19,600 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3503706.6666666665, ans=0.2 2023-11-26 17:54:19,884 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.43 vs. limit=6.0 2023-11-26 17:54:29,483 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.06 vs. limit=22.5 2023-11-26 17:54:31,480 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3503773.3333333335, ans=0.1 2023-11-26 17:54:42,936 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 8550, loss[loss=0.04944, simple_loss=0.0625, pruned_loss=0.00626, audio_tagging_loss=0.01193, over 15164.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08906, pruned_loss=0.01204, audio_tagging_loss=0.008687, over 3045003.72 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:54:55,714 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3503906.6666666665, ans=0.0 2023-11-26 17:55:06,686 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 525600 2023-11-26 17:55:08,287 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3503973.3333333335, ans=0.1 2023-11-26 17:55:26,952 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.99 vs. limit=15.0 2023-11-26 17:55:37,980 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 8600, loss[loss=0.05278, simple_loss=0.07849, pruned_loss=0.005921, audio_tagging_loss=0.007614, over 15416.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.0892, pruned_loss=0.01202, audio_tagging_loss=0.008678, over 3043613.34 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:55:49,038 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.001e+01 8.808e+01 9.575e+01 1.022e+02 1.354e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-26 17:55:54,068 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3504240.0, ans=0.125 2023-11-26 17:56:02,994 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 525650 2023-11-26 17:56:15,345 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3504373.3333333335, ans=0.1 2023-11-26 17:56:33,763 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 8650, loss[loss=0.06231, simple_loss=0.08775, pruned_loss=0.0103, audio_tagging_loss=0.008139, over 15817.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08912, pruned_loss=0.01199, audio_tagging_loss=0.008694, over 3043711.45 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:56:35,020 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3504506.6666666665, ans=0.1 2023-11-26 17:56:47,850 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3504573.3333333335, ans=0.125 2023-11-26 17:56:58,848 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 525700 2023-11-26 17:57:07,866 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.61 vs. limit=12.0 2023-11-26 17:57:14,379 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-26 17:57:15,349 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3504706.6666666665, ans=0.0 2023-11-26 17:57:23,930 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3504773.3333333335, ans=0.125 2023-11-26 17:57:30,015 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 8700, loss[loss=0.04841, simple_loss=0.05926, pruned_loss=0.01128, audio_tagging_loss=0.007503, over 14156.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.09013, pruned_loss=0.01204, audio_tagging_loss=0.008761, over 3043683.41 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:57:41,079 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.778e+01 8.922e+01 9.364e+01 9.934e+01 1.569e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-26 17:57:42,430 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3504906.6666666665, ans=0.0 2023-11-26 17:57:49,866 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3504906.6666666665, ans=0.0 2023-11-26 17:57:53,941 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 525750 2023-11-26 17:58:10,439 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3505040.0, ans=0.0 2023-11-26 17:58:15,936 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.30 vs. limit=15.0 2023-11-26 17:58:25,623 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 8750, loss[loss=0.04879, simple_loss=0.06357, pruned_loss=0.006914, audio_tagging_loss=0.01009, over 15722.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.09128, pruned_loss=0.01229, audio_tagging_loss=0.008679, over 3048209.01 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:58:25,767 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3505173.3333333335, ans=0.125 2023-11-26 17:58:29,912 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3505173.3333333335, ans=0.0 2023-11-26 17:58:30,952 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3505173.3333333335, ans=0.0 2023-11-26 17:58:35,168 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3505240.0, ans=0.125 2023-11-26 17:58:50,422 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 525800 2023-11-26 17:59:21,327 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 8800, loss[loss=0.0551, simple_loss=0.07966, pruned_loss=0.007403, audio_tagging_loss=0.007871, over 14271.00 frames. ], tot_loss[loss=0.06683, simple_loss=0.09148, pruned_loss=0.01235, audio_tagging_loss=0.008743, over 3044019.28 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:59:32,886 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.888e+01 9.136e+01 9.670e+01 1.038e+02 1.622e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-26 17:59:40,682 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3505573.3333333335, ans=0.125 2023-11-26 17:59:41,892 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.45 vs. limit=12.0 2023-11-26 17:59:45,719 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 525850 2023-11-26 17:59:45,927 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3505640.0, ans=0.07 2023-11-26 18:00:17,405 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 8850, loss[loss=0.08804, simple_loss=0.1238, pruned_loss=0.01562, audio_tagging_loss=0.01053, over 15579.00 frames. ], tot_loss[loss=0.06707, simple_loss=0.0918, pruned_loss=0.01244, audio_tagging_loss=0.008726, over 3048606.26 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 18:00:30,100 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 18:00:34,417 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.58 vs. limit=10.0 2023-11-26 18:00:35,561 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.42 vs. limit=15.0 2023-11-26 18:00:40,436 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3505973.3333333335, ans=0.2 2023-11-26 18:00:41,265 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 525900 2023-11-26 18:00:47,145 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3505973.3333333335, ans=0.125 2023-11-26 18:00:48,194 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3505973.3333333335, ans=0.1 2023-11-26 18:01:02,903 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3506106.6666666665, ans=0.125 2023-11-26 18:01:08,229 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3506106.6666666665, ans=0.125 2023-11-26 18:01:08,330 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3506106.6666666665, ans=0.125 2023-11-26 18:01:12,216 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 8900, loss[loss=0.08655, simple_loss=0.1241, pruned_loss=0.01695, audio_tagging_loss=0.00753, over 15184.00 frames. ], tot_loss[loss=0.06712, simple_loss=0.09186, pruned_loss=0.01256, audio_tagging_loss=0.008632, over 3046843.39 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 18:01:18,742 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.05 vs. limit=12.0 2023-11-26 18:01:21,343 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3506173.3333333335, ans=0.125 2023-11-26 18:01:24,350 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.622e+01 8.724e+01 9.341e+01 9.971e+01 1.167e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-26 18:01:32,551 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3506240.0, ans=0.125 2023-11-26 18:01:36,723 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3506306.6666666665, ans=0.0 2023-11-26 18:01:37,696 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 525950 2023-11-26 18:01:55,915 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3506440.0, ans=0.0 2023-11-26 18:02:05,795 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3506440.0, ans=0.125 2023-11-26 18:02:07,796 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 8950, loss[loss=0.05573, simple_loss=0.08096, pruned_loss=0.006489, audio_tagging_loss=0.00876, over 15892.00 frames. ], tot_loss[loss=0.06699, simple_loss=0.09191, pruned_loss=0.01247, audio_tagging_loss=0.008561, over 3047855.13 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:02:19,091 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3506573.3333333335, ans=0.0 2023-11-26 18:02:30,597 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3506640.0, ans=0.0 2023-11-26 18:02:30,599 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3506640.0, ans=0.125 2023-11-26 18:02:32,640 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 526000 2023-11-26 18:02:43,233 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3506706.6666666665, ans=0.2 2023-11-26 18:03:02,751 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3506773.3333333335, ans=0.0 2023-11-26 18:03:04,593 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 9000, loss[loss=0.05912, simple_loss=0.07811, pruned_loss=0.01011, audio_tagging_loss=0.009956, over 15573.00 frames. ], tot_loss[loss=0.06705, simple_loss=0.09181, pruned_loss=0.01257, audio_tagging_loss=0.008576, over 3048664.12 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 8.0 2023-11-26 18:03:04,595 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-26 18:03:17,720 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([4.4708, 3.9882, 3.6030, 4.0857, 3.7568, 3.9003, 4.0460, 3.5187], device='cuda:0') 2023-11-26 18:03:36,866 INFO [train_asr.py:1267] (0/4) Epoch 44, validation: loss=0.05857, simple_loss=0.05054, pruned_loss=0.005271, audio_tagging_loss=0.02803, over 4681554.00 frames. 2023-11-26 18:03:36,867 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-26 18:03:41,923 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3506840.0, ans=0.125 2023-11-26 18:03:50,483 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.728e+01 9.015e+01 9.647e+01 1.018e+02 1.400e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-26 18:04:01,278 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3506973.3333333335, ans=0.125 2023-11-26 18:04:02,248 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 526050 2023-11-26 18:04:30,658 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.03 vs. limit=12.0 2023-11-26 18:04:32,809 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 9050, loss[loss=0.07697, simple_loss=0.1065, pruned_loss=0.01488, audio_tagging_loss=0.008822, over 14825.00 frames. ], tot_loss[loss=0.06657, simple_loss=0.09108, pruned_loss=0.01245, audio_tagging_loss=0.008576, over 3056322.82 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 8.0 2023-11-26 18:04:48,317 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.41 vs. limit=15.0 2023-11-26 18:04:50,149 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3507240.0, ans=0.0 2023-11-26 18:04:57,319 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 526100 2023-11-26 18:05:03,911 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3507306.6666666665, ans=0.125 2023-11-26 18:05:13,925 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3507373.3333333335, ans=0.125 2023-11-26 18:05:16,563 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3507440.0, ans=0.125 2023-11-26 18:05:29,289 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 9100, loss[loss=0.07161, simple_loss=0.09512, pruned_loss=0.01536, audio_tagging_loss=0.008688, over 15585.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.09062, pruned_loss=0.01235, audio_tagging_loss=0.008493, over 3057896.58 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 8.0 2023-11-26 18:05:41,911 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.792e+01 8.875e+01 9.431e+01 1.015e+02 1.268e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-26 18:05:47,978 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.77 vs. limit=6.0 2023-11-26 18:05:53,189 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 526150 2023-11-26 18:05:53,400 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3507640.0, ans=0.125 2023-11-26 18:05:55,466 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3507640.0, ans=0.1 2023-11-26 18:06:06,791 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3507706.6666666665, ans=0.0 2023-11-26 18:06:09,988 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3507706.6666666665, ans=0.2 2023-11-26 18:06:17,360 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3507773.3333333335, ans=0.07 2023-11-26 18:06:24,510 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 9150, loss[loss=0.07389, simple_loss=0.1046, pruned_loss=0.0119, audio_tagging_loss=0.009667, over 15151.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08933, pruned_loss=0.01203, audio_tagging_loss=0.008571, over 3059629.81 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 8.0 2023-11-26 18:06:27,312 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.94 vs. limit=15.0 2023-11-26 18:06:41,280 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.32 vs. limit=15.0 2023-11-26 18:06:50,121 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 526200 2023-11-26 18:06:55,814 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3507973.3333333335, ans=0.125 2023-11-26 18:06:58,221 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.04 vs. limit=15.0 2023-11-26 18:07:16,223 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3508106.6666666665, ans=0.2 2023-11-26 18:07:20,630 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 9200, loss[loss=0.05987, simple_loss=0.08496, pruned_loss=0.0106, audio_tagging_loss=0.006793, over 16044.00 frames. ], tot_loss[loss=0.0652, simple_loss=0.0893, pruned_loss=0.01205, audio_tagging_loss=0.008501, over 3063539.35 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:07:26,131 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.50 vs. limit=15.0 2023-11-26 18:07:30,985 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3508173.3333333335, ans=0.125 2023-11-26 18:07:34,944 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.545e+01 8.887e+01 9.428e+01 1.018e+02 1.949e+02, threshold=1.886e+02, percent-clipped=1.0 2023-11-26 18:07:45,618 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 526250 2023-11-26 18:07:48,009 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3508306.6666666665, ans=0.07 2023-11-26 18:07:49,060 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3508306.6666666665, ans=0.1 2023-11-26 18:07:53,408 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.13 vs. limit=10.0 2023-11-26 18:07:58,381 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3508373.3333333335, ans=0.0 2023-11-26 18:08:00,586 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3508373.3333333335, ans=0.04949747468305833 2023-11-26 18:08:08,997 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3508440.0, ans=0.1 2023-11-26 18:08:10,562 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3508440.0, ans=0.125 2023-11-26 18:08:16,849 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.74 vs. limit=15.0 2023-11-26 18:08:17,402 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 9250, loss[loss=0.0554, simple_loss=0.07574, pruned_loss=0.01057, audio_tagging_loss=0.006963, over 14243.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.08922, pruned_loss=0.012, audio_tagging_loss=0.00855, over 3064280.91 frames. ], batch size: 53, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:08:21,842 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3508506.6666666665, ans=0.125 2023-11-26 18:08:40,194 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.46 vs. limit=15.0 2023-11-26 18:08:40,776 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 526300 2023-11-26 18:08:43,103 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3508640.0, ans=0.125 2023-11-26 18:08:58,522 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3508706.6666666665, ans=0.0 2023-11-26 18:09:12,313 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 9300, loss[loss=0.07646, simple_loss=0.1055, pruned_loss=0.01546, audio_tagging_loss=0.00826, over 14887.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.08976, pruned_loss=0.01219, audio_tagging_loss=0.008664, over 3067211.48 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:09:17,840 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3508840.0, ans=0.125 2023-11-26 18:09:18,926 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3508840.0, ans=0.2 2023-11-26 18:09:25,101 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.636e+01 8.680e+01 9.500e+01 1.023e+02 1.279e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-26 18:09:37,837 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 526350 2023-11-26 18:09:37,993 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3508973.3333333335, ans=0.0 2023-11-26 18:10:01,229 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3509106.6666666665, ans=0.125 2023-11-26 18:10:03,450 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3509106.6666666665, ans=0.125 2023-11-26 18:10:07,414 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 9350, loss[loss=0.07971, simple_loss=0.101, pruned_loss=0.01738, audio_tagging_loss=0.01185, over 16373.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.09039, pruned_loss=0.01235, audio_tagging_loss=0.008611, over 3065271.60 frames. ], batch size: 63, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:10:15,609 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.88 vs. limit=22.5 2023-11-26 18:10:30,062 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3509306.6666666665, ans=0.0 2023-11-26 18:10:33,007 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 526400 2023-11-26 18:10:41,237 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.23 vs. limit=22.5 2023-11-26 18:10:43,078 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3509373.3333333335, ans=0.1 2023-11-26 18:10:44,149 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3509373.3333333335, ans=0.2 2023-11-26 18:10:53,646 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3509440.0, ans=0.125 2023-11-26 18:11:04,585 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 9400, loss[loss=0.07355, simple_loss=0.09492, pruned_loss=0.01618, audio_tagging_loss=0.009914, over 14510.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.09053, pruned_loss=0.01225, audio_tagging_loss=0.008666, over 3058175.38 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:11:17,204 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.575e+01 8.876e+01 9.519e+01 1.023e+02 1.284e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-26 18:11:18,545 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3509573.3333333335, ans=0.125 2023-11-26 18:11:23,761 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3509573.3333333335, ans=0.125 2023-11-26 18:11:23,993 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.52 vs. limit=15.0 2023-11-26 18:11:27,907 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 526450 2023-11-26 18:11:50,432 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3509773.3333333335, ans=0.1 2023-11-26 18:11:52,637 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3509773.3333333335, ans=0.125 2023-11-26 18:11:59,794 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 9450, loss[loss=0.08028, simple_loss=0.1103, pruned_loss=0.01393, audio_tagging_loss=0.01118, over 14924.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.09082, pruned_loss=0.01221, audio_tagging_loss=0.008757, over 3056036.38 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:12:00,896 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 18:12:01,355 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.01 vs. limit=15.0 2023-11-26 18:12:16,040 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3509906.6666666665, ans=0.125 2023-11-26 18:12:24,997 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 526500 2023-11-26 18:12:34,109 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3510040.0, ans=0.125 2023-11-26 18:12:39,764 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.13 vs. limit=22.5 2023-11-26 18:12:55,192 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 9500, loss[loss=0.07222, simple_loss=0.1004, pruned_loss=0.0154, audio_tagging_loss=0.0066, over 15313.00 frames. ], tot_loss[loss=0.06664, simple_loss=0.09067, pruned_loss=0.01242, audio_tagging_loss=0.00888, over 3058016.42 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:13:09,443 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.180e+01 8.837e+01 9.537e+01 1.034e+02 1.299e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-26 18:13:11,386 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3510240.0, ans=0.2 2023-11-26 18:13:13,452 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3510240.0, ans=0.125 2023-11-26 18:13:20,786 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 526550 2023-11-26 18:13:35,680 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 18:13:42,568 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.74 vs. limit=15.0 2023-11-26 18:13:51,803 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 9550, loss[loss=0.07662, simple_loss=0.1105, pruned_loss=0.01207, audio_tagging_loss=0.009312, over 16479.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.08992, pruned_loss=0.0123, audio_tagging_loss=0.008969, over 3053375.83 frames. ], batch size: 64, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:14:11,576 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3510573.3333333335, ans=0.2 2023-11-26 18:14:15,653 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 526600 2023-11-26 18:14:32,618 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3510706.6666666665, ans=0.0 2023-11-26 18:14:47,781 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 9600, loss[loss=0.06128, simple_loss=0.08464, pruned_loss=0.01232, audio_tagging_loss=0.006641, over 15328.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.09055, pruned_loss=0.01233, audio_tagging_loss=0.008953, over 3053105.98 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 18:14:52,472 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.33 vs. limit=15.0 2023-11-26 18:14:58,775 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 18:15:00,586 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.506e+01 8.909e+01 9.426e+01 1.006e+02 1.618e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-26 18:15:07,652 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.32 vs. limit=22.5 2023-11-26 18:15:11,654 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 526650 2023-11-26 18:15:21,361 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3511040.0, ans=0.125 2023-11-26 18:15:30,476 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3511040.0, ans=0.125 2023-11-26 18:15:33,831 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3511106.6666666665, ans=0.125 2023-11-26 18:15:33,945 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.78 vs. limit=10.0 2023-11-26 18:15:43,133 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 9650, loss[loss=0.05611, simple_loss=0.08349, pruned_loss=0.008401, audio_tagging_loss=0.005967, over 14429.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.09014, pruned_loss=0.01223, audio_tagging_loss=0.008936, over 3054747.80 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 18:15:47,551 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3511173.3333333335, ans=0.0 2023-11-26 18:16:06,654 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3511306.6666666665, ans=0.1 2023-11-26 18:16:08,687 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 526700 2023-11-26 18:16:13,091 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3511306.6666666665, ans=0.2 2023-11-26 18:16:16,746 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.03 vs. limit=15.0 2023-11-26 18:16:40,144 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 9700, loss[loss=0.05564, simple_loss=0.07057, pruned_loss=0.01147, audio_tagging_loss=0.008887, over 16488.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.09035, pruned_loss=0.01221, audio_tagging_loss=0.0087, over 3053309.74 frames. ], batch size: 65, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:16:42,441 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3511506.6666666665, ans=0.125 2023-11-26 18:16:44,581 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3511506.6666666665, ans=0.125 2023-11-26 18:16:54,486 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.934e+01 9.029e+01 9.553e+01 1.029e+02 1.538e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-26 18:17:04,058 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 526750 2023-11-26 18:17:08,383 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3511640.0, ans=0.125 2023-11-26 18:17:09,458 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3511640.0, ans=0.1 2023-11-26 18:17:35,707 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 9750, loss[loss=0.04275, simple_loss=0.05053, pruned_loss=0.007199, audio_tagging_loss=0.01028, over 13896.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.09058, pruned_loss=0.01229, audio_tagging_loss=0.008624, over 3053540.33 frames. ], batch size: 53, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:17:39,208 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3511840.0, ans=0.1 2023-11-26 18:17:55,344 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 18:17:58,428 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3511973.3333333335, ans=0.125 2023-11-26 18:17:59,897 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 526800 2023-11-26 18:18:08,953 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3512040.0, ans=0.125 2023-11-26 18:18:11,089 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3512040.0, ans=0.125 2023-11-26 18:18:31,618 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 9800, loss[loss=0.07035, simple_loss=0.1039, pruned_loss=0.01004, audio_tagging_loss=0.008377, over 17096.00 frames. ], tot_loss[loss=0.06649, simple_loss=0.09106, pruned_loss=0.01235, audio_tagging_loss=0.008613, over 3050957.68 frames. ], batch size: 63, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:18:37,130 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3512173.3333333335, ans=0.0 2023-11-26 18:18:45,885 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.579e+01 8.863e+01 9.415e+01 1.026e+02 1.443e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-26 18:18:56,648 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 526850 2023-11-26 18:19:06,848 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3512373.3333333335, ans=0.1 2023-11-26 18:19:20,647 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3512440.0, ans=0.125 2023-11-26 18:19:22,544 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 18:19:27,290 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 9850, loss[loss=0.04657, simple_loss=0.05765, pruned_loss=0.008179, audio_tagging_loss=0.009565, over 16880.00 frames. ], tot_loss[loss=0.06657, simple_loss=0.09122, pruned_loss=0.01238, audio_tagging_loss=0.008587, over 3050352.65 frames. ], batch size: 66, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:19:34,963 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3512506.6666666665, ans=0.125 2023-11-26 18:19:48,053 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3512573.3333333335, ans=0.125 2023-11-26 18:19:52,212 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 526900 2023-11-26 18:19:55,415 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3512640.0, ans=0.1 2023-11-26 18:20:23,495 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 9900, loss[loss=0.08848, simple_loss=0.13, pruned_loss=0.01699, audio_tagging_loss=0.00647, over 15641.00 frames. ], tot_loss[loss=0.0668, simple_loss=0.09195, pruned_loss=0.01237, audio_tagging_loss=0.008458, over 3052876.55 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:20:23,785 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3512840.0, ans=0.0 2023-11-26 18:20:26,799 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3512840.0, ans=0.0 2023-11-26 18:20:36,898 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3512906.6666666665, ans=0.125 2023-11-26 18:20:37,752 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.419e+01 8.893e+01 9.553e+01 1.033e+02 1.550e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-26 18:20:47,269 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 526950 2023-11-26 18:20:50,562 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.76 vs. limit=22.5 2023-11-26 18:21:19,081 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 9950, loss[loss=0.08499, simple_loss=0.1115, pruned_loss=0.01841, audio_tagging_loss=0.01081, over 16426.00 frames. ], tot_loss[loss=0.06668, simple_loss=0.09178, pruned_loss=0.0123, audio_tagging_loss=0.008488, over 3056434.18 frames. ], batch size: 61, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:21:33,935 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.26 vs. limit=15.0 2023-11-26 18:21:40,289 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3513240.0, ans=10.0 2023-11-26 18:21:44,383 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 527000 2023-11-26 18:22:03,834 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.59 vs. limit=12.0 2023-11-26 18:22:05,916 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.09 vs. limit=15.0 2023-11-26 18:22:13,085 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3513440.0, ans=0.0 2023-11-26 18:22:15,654 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 10000, loss[loss=0.05118, simple_loss=0.0674, pruned_loss=0.008641, audio_tagging_loss=0.00884, over 15368.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.09121, pruned_loss=0.01232, audio_tagging_loss=0.008472, over 3055754.67 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 18:22:15,935 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3513506.6666666665, ans=0.0 2023-11-26 18:22:25,112 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.85 vs. limit=10.0 2023-11-26 18:22:29,971 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.250e+01 8.698e+01 9.339e+01 1.006e+02 1.184e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-26 18:22:30,174 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 18:22:31,317 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3513573.3333333335, ans=0.1 2023-11-26 18:22:40,331 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 527050 2023-11-26 18:23:11,565 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 10050, loss[loss=0.06188, simple_loss=0.08097, pruned_loss=0.01009, audio_tagging_loss=0.01131, over 14888.00 frames. ], tot_loss[loss=0.0664, simple_loss=0.09123, pruned_loss=0.01228, audio_tagging_loss=0.008502, over 3060557.36 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 18:23:13,867 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3513840.0, ans=0.0 2023-11-26 18:23:18,587 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.45 vs. limit=15.0 2023-11-26 18:23:35,621 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 527100 2023-11-26 18:23:57,439 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3514106.6666666665, ans=0.125 2023-11-26 18:23:58,596 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3514106.6666666665, ans=0.2 2023-11-26 18:24:06,816 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 10100, loss[loss=0.06969, simple_loss=0.09148, pruned_loss=0.01341, audio_tagging_loss=0.01053, over 14781.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.09039, pruned_loss=0.01223, audio_tagging_loss=0.00858, over 3059188.93 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 18:24:10,671 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3514173.3333333335, ans=0.125 2023-11-26 18:24:21,135 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.418e+01 8.926e+01 9.577e+01 1.044e+02 1.166e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-26 18:24:29,554 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.81 vs. limit=15.0 2023-11-26 18:24:32,312 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 527150 2023-11-26 18:24:48,679 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.95 vs. limit=15.0 2023-11-26 18:24:53,878 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 18:24:58,157 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3514440.0, ans=0.0 2023-11-26 18:25:00,323 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3514440.0, ans=0.125 2023-11-26 18:25:02,355 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 10150, loss[loss=0.06417, simple_loss=0.09718, pruned_loss=0.009036, audio_tagging_loss=0.006541, over 15267.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.08981, pruned_loss=0.01227, audio_tagging_loss=0.008645, over 3051160.32 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:25:03,550 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3514506.6666666665, ans=0.125 2023-11-26 18:25:15,542 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3514573.3333333335, ans=0.2 2023-11-26 18:25:19,645 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3514573.3333333335, ans=0.0 2023-11-26 18:25:26,940 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 527200 2023-11-26 18:25:27,117 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3514640.0, ans=0.0 2023-11-26 18:25:30,461 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 18:25:36,521 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3514706.6666666665, ans=0.1 2023-11-26 18:25:39,731 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3514706.6666666665, ans=0.125 2023-11-26 18:25:48,666 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3514773.3333333335, ans=0.0 2023-11-26 18:25:58,637 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 10200, loss[loss=0.06794, simple_loss=0.09394, pruned_loss=0.01213, audio_tagging_loss=0.008832, over 15622.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.09003, pruned_loss=0.01232, audio_tagging_loss=0.008743, over 3052268.43 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:26:01,453 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2023-11-26 18:26:12,687 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 18:26:13,396 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.250e+01 9.101e+01 9.803e+01 1.043e+02 1.180e+02, threshold=1.961e+02, percent-clipped=0.0 2023-11-26 18:26:20,593 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3514973.3333333335, ans=0.2 2023-11-26 18:26:21,415 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 18:26:22,529 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 527250 2023-11-26 18:26:38,724 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.61 vs. limit=15.0 2023-11-26 18:26:53,044 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 10250, loss[loss=0.04152, simple_loss=0.05128, pruned_loss=0.00588, audio_tagging_loss=0.01, over 14933.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.08998, pruned_loss=0.01226, audio_tagging_loss=0.00882, over 3055204.70 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:26:54,391 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 18:27:18,174 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 527300 2023-11-26 18:27:25,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3515373.3333333335, ans=0.0 2023-11-26 18:27:29,127 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3515373.3333333335, ans=0.0 2023-11-26 18:27:30,258 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3515373.3333333335, ans=0.025 2023-11-26 18:27:37,511 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3515440.0, ans=0.0 2023-11-26 18:27:42,982 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3515440.0, ans=0.2 2023-11-26 18:27:48,562 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 10300, loss[loss=0.06171, simple_loss=0.07784, pruned_loss=0.01413, audio_tagging_loss=0.008659, over 15454.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.0896, pruned_loss=0.01223, audio_tagging_loss=0.00889, over 3060022.57 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 8.0 2023-11-26 18:27:48,812 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 18:27:49,915 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3515506.6666666665, ans=0.125 2023-11-26 18:27:50,062 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.93 vs. limit=15.0 2023-11-26 18:28:00,695 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3515573.3333333335, ans=0.125 2023-11-26 18:28:02,828 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3515573.3333333335, ans=0.1 2023-11-26 18:28:04,868 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3515573.3333333335, ans=0.125 2023-11-26 18:28:05,633 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.445e+01 8.902e+01 9.462e+01 1.025e+02 1.207e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-26 18:28:13,129 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 527350 2023-11-26 18:28:17,426 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3515640.0, ans=0.125 2023-11-26 18:28:26,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3515706.6666666665, ans=0.125 2023-11-26 18:28:35,886 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3515773.3333333335, ans=0.0 2023-11-26 18:28:45,127 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 10350, loss[loss=0.05884, simple_loss=0.07833, pruned_loss=0.007787, audio_tagging_loss=0.01189, over 14344.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08893, pruned_loss=0.01209, audio_tagging_loss=0.009026, over 3056084.48 frames. ], batch size: 53, lr: 1.53e-03, grad_scale: 8.0 2023-11-26 18:28:52,778 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3515840.0, ans=0.125 2023-11-26 18:28:59,191 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3515906.6666666665, ans=10.0 2023-11-26 18:29:08,699 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 527400 2023-11-26 18:29:18,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3516040.0, ans=0.1 2023-11-26 18:29:21,099 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.74 vs. limit=15.0 2023-11-26 18:29:22,335 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3516040.0, ans=0.125 2023-11-26 18:29:22,483 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3516040.0, ans=0.0 2023-11-26 18:29:24,084 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.91 vs. limit=15.0 2023-11-26 18:29:25,373 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.87 vs. limit=6.0 2023-11-26 18:29:38,685 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3516106.6666666665, ans=0.1 2023-11-26 18:29:40,531 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 10400, loss[loss=0.07912, simple_loss=0.1056, pruned_loss=0.01848, audio_tagging_loss=0.007846, over 14632.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.08895, pruned_loss=0.0121, audio_tagging_loss=0.009135, over 3048166.44 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:29:57,021 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.796e+01 9.067e+01 9.465e+01 1.042e+02 1.301e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-26 18:29:58,256 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3516240.0, ans=0.0 2023-11-26 18:30:05,660 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 527450 2023-11-26 18:30:05,988 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.84 vs. limit=22.5 2023-11-26 18:30:17,426 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=3516373.3333333335, ans=15.0 2023-11-26 18:30:35,798 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 10450, loss[loss=0.06521, simple_loss=0.09218, pruned_loss=0.01047, audio_tagging_loss=0.008647, over 15487.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.08948, pruned_loss=0.01227, audio_tagging_loss=0.009079, over 3048135.17 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:31:01,431 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 527500 2023-11-26 18:31:03,694 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3516640.0, ans=0.125 2023-11-26 18:31:08,247 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.37 vs. limit=15.0 2023-11-26 18:31:21,677 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3516773.3333333335, ans=0.07 2023-11-26 18:31:23,319 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3516773.3333333335, ans=0.125 2023-11-26 18:31:24,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3516773.3333333335, ans=0.125 2023-11-26 18:31:26,090 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.18 vs. limit=10.0 2023-11-26 18:31:33,190 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 10500, loss[loss=0.06248, simple_loss=0.08395, pruned_loss=0.01569, audio_tagging_loss=0.004813, over 14234.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08901, pruned_loss=0.01219, audio_tagging_loss=0.008939, over 3043154.91 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:31:35,514 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3516840.0, ans=0.125 2023-11-26 18:31:36,657 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3516840.0, ans=0.1 2023-11-26 18:31:42,988 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3516906.6666666665, ans=0.025 2023-11-26 18:31:48,943 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.579e+01 8.975e+01 9.604e+01 1.035e+02 1.568e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-26 18:31:51,631 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.57 vs. limit=15.0 2023-11-26 18:31:56,524 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 527550 2023-11-26 18:32:04,102 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3517040.0, ans=0.2 2023-11-26 18:32:04,409 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.59 vs. limit=22.5 2023-11-26 18:32:26,051 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3517106.6666666665, ans=0.2 2023-11-26 18:32:26,957 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3517173.3333333335, ans=0.125 2023-11-26 18:32:27,964 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 10550, loss[loss=0.07423, simple_loss=0.09521, pruned_loss=0.01781, audio_tagging_loss=0.008813, over 16018.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.08919, pruned_loss=0.01221, audio_tagging_loss=0.008787, over 3047124.04 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:32:45,044 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3517240.0, ans=0.0 2023-11-26 18:32:52,366 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 527600 2023-11-26 18:33:13,023 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3517440.0, ans=0.125 2023-11-26 18:33:13,975 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3517440.0, ans=0.0 2023-11-26 18:33:20,355 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3517440.0, ans=0.125 2023-11-26 18:33:22,366 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3517506.6666666665, ans=0.0 2023-11-26 18:33:23,268 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 10600, loss[loss=0.07775, simple_loss=0.1151, pruned_loss=0.01401, audio_tagging_loss=0.006196, over 15575.00 frames. ], tot_loss[loss=0.06522, simple_loss=0.08858, pruned_loss=0.01212, audio_tagging_loss=0.008812, over 3045691.09 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:33:33,641 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 18:33:35,316 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3517573.3333333335, ans=0.1 2023-11-26 18:33:41,550 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.914e+01 8.748e+01 9.260e+01 9.948e+01 1.249e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-26 18:33:44,117 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.08 vs. limit=15.0 2023-11-26 18:33:45,251 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.56 vs. limit=12.0 2023-11-26 18:33:49,067 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 527650 2023-11-26 18:34:04,681 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.44 vs. limit=12.0 2023-11-26 18:34:20,602 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 10650, loss[loss=0.0842, simple_loss=0.1142, pruned_loss=0.0186, audio_tagging_loss=0.008478, over 15093.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08886, pruned_loss=0.01207, audio_tagging_loss=0.008758, over 3040764.41 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:34:28,737 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3517840.0, ans=0.09899494936611666 2023-11-26 18:34:29,782 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3517840.0, ans=0.2 2023-11-26 18:34:44,376 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 527700 2023-11-26 18:34:53,018 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3518040.0, ans=0.125 2023-11-26 18:35:06,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3518106.6666666665, ans=0.125 2023-11-26 18:35:10,019 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3518106.6666666665, ans=0.0 2023-11-26 18:35:14,147 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3518106.6666666665, ans=0.125 2023-11-26 18:35:16,068 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 10700, loss[loss=0.07061, simple_loss=0.09856, pruned_loss=0.014, audio_tagging_loss=0.007322, over 16070.00 frames. ], tot_loss[loss=0.06484, simple_loss=0.08809, pruned_loss=0.01212, audio_tagging_loss=0.008672, over 3034347.07 frames. ], batch size: 61, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:35:19,475 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3518173.3333333335, ans=0.125 2023-11-26 18:35:32,093 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.164e+01 8.938e+01 9.482e+01 1.003e+02 1.497e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-26 18:35:40,327 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 527750 2023-11-26 18:35:40,508 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=3518306.6666666665, ans=0.5 2023-11-26 18:35:47,440 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3518306.6666666665, ans=0.125 2023-11-26 18:35:48,461 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3518306.6666666665, ans=0.125 2023-11-26 18:35:54,877 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 18:35:55,848 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3518373.3333333335, ans=0.2 2023-11-26 18:35:57,150 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=3518373.3333333335, ans=15.0 2023-11-26 18:36:00,042 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3518440.0, ans=0.2 2023-11-26 18:36:08,584 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3518440.0, ans=0.0 2023-11-26 18:36:11,557 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 10750, loss[loss=0.06895, simple_loss=0.09761, pruned_loss=0.01082, audio_tagging_loss=0.009325, over 14551.00 frames. ], tot_loss[loss=0.06479, simple_loss=0.08816, pruned_loss=0.01205, audio_tagging_loss=0.008661, over 3035667.48 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:36:21,453 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3518573.3333333335, ans=0.1 2023-11-26 18:36:36,388 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3518640.0, ans=0.1 2023-11-26 18:36:37,319 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 527800 2023-11-26 18:36:42,035 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3518640.0, ans=0.1 2023-11-26 18:37:08,284 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 10800, loss[loss=0.05944, simple_loss=0.08172, pruned_loss=0.01121, audio_tagging_loss=0.007369, over 16059.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08889, pruned_loss=0.01217, audio_tagging_loss=0.008677, over 3050071.24 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 18:37:08,979 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.62 vs. limit=15.0 2023-11-26 18:37:25,434 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.531e+01 8.760e+01 9.363e+01 9.951e+01 1.169e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-26 18:37:26,815 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3518906.6666666665, ans=0.125 2023-11-26 18:37:27,908 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 18:37:33,057 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 527850 2023-11-26 18:37:58,318 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3519106.6666666665, ans=0.0 2023-11-26 18:38:00,240 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.37 vs. limit=6.0 2023-11-26 18:38:04,873 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 10850, loss[loss=0.05738, simple_loss=0.07927, pruned_loss=0.008043, audio_tagging_loss=0.009697, over 15015.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08927, pruned_loss=0.01199, audio_tagging_loss=0.008639, over 3052453.52 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:38:13,613 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3519173.3333333335, ans=0.125 2023-11-26 18:38:23,611 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.32 vs. limit=15.0 2023-11-26 18:38:28,374 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 527900 2023-11-26 18:38:44,648 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3519373.3333333335, ans=0.0 2023-11-26 18:38:59,180 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 18:39:00,248 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 10900, loss[loss=0.05886, simple_loss=0.08044, pruned_loss=0.008622, audio_tagging_loss=0.01002, over 15207.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08947, pruned_loss=0.01196, audio_tagging_loss=0.008632, over 3053833.57 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:39:00,725 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.47 vs. limit=6.0 2023-11-26 18:39:18,334 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.607e+01 8.701e+01 9.472e+01 1.014e+02 1.998e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-26 18:39:25,357 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 527950 2023-11-26 18:39:33,617 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.67 vs. limit=15.0 2023-11-26 18:39:37,989 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.98 vs. limit=15.0 2023-11-26 18:39:55,870 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 10950, loss[loss=0.06024, simple_loss=0.08297, pruned_loss=0.009791, audio_tagging_loss=0.008968, over 15659.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.0895, pruned_loss=0.01206, audio_tagging_loss=0.008618, over 3051028.20 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:40:03,676 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3519840.0, ans=0.125 2023-11-26 18:40:07,919 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3519906.6666666665, ans=0.0 2023-11-26 18:40:10,091 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3519906.6666666665, ans=0.125 2023-11-26 18:40:14,752 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3519906.6666666665, ans=0.1 2023-11-26 18:40:21,097 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 528000 2023-11-26 18:40:22,407 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-528000.pt 2023-11-26 18:40:32,073 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3520040.0, ans=0.95 2023-11-26 18:40:33,233 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3520040.0, ans=0.0 2023-11-26 18:40:42,206 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3520106.6666666665, ans=0.125 2023-11-26 18:40:54,855 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 11000, loss[loss=0.06219, simple_loss=0.07732, pruned_loss=0.0117, audio_tagging_loss=0.01182, over 16858.00 frames. ], tot_loss[loss=0.06503, simple_loss=0.08886, pruned_loss=0.01196, audio_tagging_loss=0.00864, over 3053896.01 frames. ], batch size: 65, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:40:57,755 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3520173.3333333335, ans=0.0 2023-11-26 18:41:06,042 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 18:41:12,352 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.170e+01 8.767e+01 9.294e+01 1.014e+02 1.282e+02, threshold=1.859e+02, percent-clipped=1.0 2023-11-26 18:41:18,938 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 528050 2023-11-26 18:41:21,117 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3520306.6666666665, ans=0.125 2023-11-26 18:41:28,638 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3520373.3333333335, ans=0.125 2023-11-26 18:41:31,238 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3520373.3333333335, ans=0.125 2023-11-26 18:41:45,596 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3520440.0, ans=0.0 2023-11-26 18:41:46,740 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3520440.0, ans=0.125 2023-11-26 18:41:50,613 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 11050, loss[loss=0.05157, simple_loss=0.06505, pruned_loss=0.009977, audio_tagging_loss=0.009069, over 15116.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08899, pruned_loss=0.01208, audio_tagging_loss=0.008714, over 3050948.52 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:42:03,822 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.27 vs. limit=15.0 2023-11-26 18:42:15,712 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 528100 2023-11-26 18:42:27,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3520706.6666666665, ans=0.125 2023-11-26 18:42:28,718 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.33 vs. limit=15.0 2023-11-26 18:42:31,397 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3520706.6666666665, ans=0.0 2023-11-26 18:42:35,662 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3520773.3333333335, ans=0.1 2023-11-26 18:42:36,806 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3520773.3333333335, ans=0.1 2023-11-26 18:42:46,087 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 11100, loss[loss=0.05301, simple_loss=0.0704, pruned_loss=0.007883, audio_tagging_loss=0.009931, over 15970.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.08895, pruned_loss=0.01213, audio_tagging_loss=0.008855, over 3057293.46 frames. ], batch size: 62, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:42:55,834 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.63 vs. limit=12.0 2023-11-26 18:43:04,538 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.363e+01 9.164e+01 1.008e+02 1.089e+02 1.427e+02, threshold=2.015e+02, percent-clipped=0.0 2023-11-26 18:43:11,698 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 528150 2023-11-26 18:43:12,960 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3520973.3333333335, ans=0.125 2023-11-26 18:43:43,211 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 11150, loss[loss=0.07136, simple_loss=0.1009, pruned_loss=0.01145, audio_tagging_loss=0.009478, over 15633.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.08957, pruned_loss=0.01232, audio_tagging_loss=0.008878, over 3049564.67 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:44:07,142 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 528200 2023-11-26 18:44:11,802 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3521306.6666666665, ans=0.125 2023-11-26 18:44:29,117 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3521440.0, ans=0.125 2023-11-26 18:44:39,046 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 11200, loss[loss=0.06961, simple_loss=0.08757, pruned_loss=0.01697, audio_tagging_loss=0.008853, over 14392.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.08863, pruned_loss=0.01217, audio_tagging_loss=0.009003, over 3042114.82 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:44:46,684 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3521506.6666666665, ans=0.0 2023-11-26 18:44:49,973 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3521573.3333333335, ans=0.07 2023-11-26 18:44:56,785 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3521573.3333333335, ans=0.0 2023-11-26 18:44:58,207 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.556e+01 8.872e+01 9.383e+01 1.014e+02 1.200e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-26 18:45:04,149 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 528250 2023-11-26 18:45:05,924 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.69 vs. limit=22.5 2023-11-26 18:45:14,167 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.44 vs. limit=10.0 2023-11-26 18:45:34,281 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 11250, loss[loss=0.05871, simple_loss=0.08278, pruned_loss=0.009818, audio_tagging_loss=0.007505, over 15149.00 frames. ], tot_loss[loss=0.0649, simple_loss=0.08787, pruned_loss=0.01195, audio_tagging_loss=0.009013, over 3047927.63 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 8.0 2023-11-26 18:45:34,574 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3521840.0, ans=10.0 2023-11-26 18:45:37,002 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.07 vs. limit=10.0 2023-11-26 18:45:41,221 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.63 vs. limit=15.0 2023-11-26 18:45:59,613 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 528300 2023-11-26 18:46:04,496 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3521973.3333333335, ans=0.0 2023-11-26 18:46:17,139 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3522040.0, ans=0.5 2023-11-26 18:46:19,370 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3522106.6666666665, ans=0.125 2023-11-26 18:46:31,414 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 11300, loss[loss=0.05378, simple_loss=0.07215, pruned_loss=0.008918, audio_tagging_loss=0.008785, over 14398.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08848, pruned_loss=0.01213, audio_tagging_loss=0.008868, over 3046736.49 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 8.0 2023-11-26 18:46:40,080 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3522173.3333333335, ans=0.125 2023-11-26 18:46:48,072 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3522240.0, ans=0.1 2023-11-26 18:46:51,040 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.700e+01 8.924e+01 9.340e+01 1.002e+02 1.202e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-26 18:46:55,416 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 528350 2023-11-26 18:46:56,615 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3522306.6666666665, ans=0.125 2023-11-26 18:47:26,472 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 11350, loss[loss=0.06843, simple_loss=0.08707, pruned_loss=0.01516, audio_tagging_loss=0.00974, over 14851.00 frames. ], tot_loss[loss=0.0651, simple_loss=0.08846, pruned_loss=0.01215, audio_tagging_loss=0.008718, over 3048632.61 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 8.0 2023-11-26 18:47:26,633 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3522506.6666666665, ans=0.2 2023-11-26 18:47:34,558 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3522506.6666666665, ans=0.125 2023-11-26 18:47:35,635 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3522506.6666666665, ans=0.0 2023-11-26 18:47:48,051 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3522640.0, ans=0.07 2023-11-26 18:47:48,341 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.02 vs. limit=10.0 2023-11-26 18:47:49,694 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3522640.0, ans=0.125 2023-11-26 18:47:51,582 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 528400 2023-11-26 18:47:56,737 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3522640.0, ans=0.125 2023-11-26 18:47:57,782 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3522640.0, ans=0.125 2023-11-26 18:48:00,078 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3522706.6666666665, ans=0.0 2023-11-26 18:48:07,569 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3522706.6666666665, ans=0.0 2023-11-26 18:48:17,800 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.26 vs. limit=22.5 2023-11-26 18:48:22,541 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 11400, loss[loss=0.07135, simple_loss=0.1088, pruned_loss=0.01151, audio_tagging_loss=0.005444, over 14877.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.08913, pruned_loss=0.01227, audio_tagging_loss=0.008625, over 3038475.88 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 8.0 2023-11-26 18:48:30,799 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3522840.0, ans=0.09899494936611666 2023-11-26 18:48:43,232 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.848e+01 9.001e+01 9.594e+01 1.048e+02 1.378e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-26 18:48:47,598 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 528450 2023-11-26 18:48:53,157 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.75 vs. limit=15.0 2023-11-26 18:49:06,307 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3523106.6666666665, ans=0.1 2023-11-26 18:49:12,142 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3523106.6666666665, ans=0.125 2023-11-26 18:49:15,986 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3523106.6666666665, ans=0.2 2023-11-26 18:49:18,582 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=3523173.3333333335, ans=0.025 2023-11-26 18:49:19,512 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 11450, loss[loss=0.0756, simple_loss=0.1075, pruned_loss=0.01587, audio_tagging_loss=0.005968, over 14958.00 frames. ], tot_loss[loss=0.06531, simple_loss=0.089, pruned_loss=0.01226, audio_tagging_loss=0.008552, over 3042515.27 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 8.0 2023-11-26 18:49:19,710 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3523173.3333333335, ans=0.07 2023-11-26 18:49:22,098 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.88 vs. limit=15.0 2023-11-26 18:49:36,663 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3523240.0, ans=0.125 2023-11-26 18:49:42,823 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 528500 2023-11-26 18:49:46,775 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3523306.6666666665, ans=0.125 2023-11-26 18:50:09,546 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3523440.0, ans=0.125 2023-11-26 18:50:11,571 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3523440.0, ans=0.125 2023-11-26 18:50:14,552 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 11500, loss[loss=0.06876, simple_loss=0.09135, pruned_loss=0.01621, audio_tagging_loss=0.00688, over 14969.00 frames. ], tot_loss[loss=0.06495, simple_loss=0.08866, pruned_loss=0.01204, audio_tagging_loss=0.00858, over 3043991.17 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 8.0 2023-11-26 18:50:16,819 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3523506.6666666665, ans=0.0 2023-11-26 18:50:21,045 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 18:50:21,215 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3523506.6666666665, ans=0.2 2023-11-26 18:50:34,136 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.429e+01 8.724e+01 9.278e+01 1.017e+02 1.417e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-26 18:50:39,565 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 528550 2023-11-26 18:50:39,699 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 18:50:41,836 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3523640.0, ans=0.0 2023-11-26 18:50:54,063 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=3523706.6666666665, ans=0.05 2023-11-26 18:51:09,833 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 11550, loss[loss=0.06456, simple_loss=0.08318, pruned_loss=0.01261, audio_tagging_loss=0.01036, over 15166.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.08892, pruned_loss=0.01209, audio_tagging_loss=0.008616, over 3054405.81 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 8.0 2023-11-26 18:51:32,436 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3523973.3333333335, ans=0.1 2023-11-26 18:51:35,394 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 528600 2023-11-26 18:51:45,530 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3524040.0, ans=0.2 2023-11-26 18:51:47,424 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 18:51:51,954 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3524040.0, ans=0.0 2023-11-26 18:52:07,221 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 11600, loss[loss=0.05961, simple_loss=0.08036, pruned_loss=0.01277, audio_tagging_loss=0.006664, over 15633.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.09026, pruned_loss=0.01238, audio_tagging_loss=0.008596, over 3052408.04 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:52:12,243 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_na.min_abs, batch_count=3524173.3333333335, ans=0.02 2023-11-26 18:52:26,811 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.281e+01 8.975e+01 9.642e+01 1.029e+02 1.280e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-26 18:52:31,134 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 528650 2023-11-26 18:52:54,824 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3524440.0, ans=0.125 2023-11-26 18:53:02,065 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3524506.6666666665, ans=0.2 2023-11-26 18:53:02,900 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 11650, loss[loss=0.06256, simple_loss=0.09018, pruned_loss=0.009691, audio_tagging_loss=0.007782, over 15004.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.0901, pruned_loss=0.01248, audio_tagging_loss=0.008604, over 3052254.97 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:53:05,284 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3524506.6666666665, ans=0.0 2023-11-26 18:53:18,185 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3524573.3333333335, ans=0.2 2023-11-26 18:53:27,501 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 528700 2023-11-26 18:53:34,965 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3524640.0, ans=0.125 2023-11-26 18:53:43,522 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3524706.6666666665, ans=0.2 2023-11-26 18:53:57,172 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3524840.0, ans=0.0 2023-11-26 18:53:57,961 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 11700, loss[loss=0.05598, simple_loss=0.07726, pruned_loss=0.009963, audio_tagging_loss=0.007387, over 14156.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.08984, pruned_loss=0.01249, audio_tagging_loss=0.008719, over 3052740.37 frames. ], batch size: 53, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:54:19,354 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.034e+01 9.030e+01 9.498e+01 1.025e+02 1.677e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-26 18:54:22,788 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3524973.3333333335, ans=0.125 2023-11-26 18:54:23,685 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 528750 2023-11-26 18:54:42,974 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3525106.6666666665, ans=0.2 2023-11-26 18:54:52,760 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3525106.6666666665, ans=0.1 2023-11-26 18:54:55,275 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 11750, loss[loss=0.047, simple_loss=0.05641, pruned_loss=0.007071, audio_tagging_loss=0.01173, over 14626.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.08897, pruned_loss=0.01231, audio_tagging_loss=0.008777, over 3049285.16 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:54:55,809 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.81 vs. limit=15.0 2023-11-26 18:55:14,083 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3525240.0, ans=0.0 2023-11-26 18:55:15,059 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3525240.0, ans=0.0 2023-11-26 18:55:19,240 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 528800 2023-11-26 18:55:28,039 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3525373.3333333335, ans=0.0 2023-11-26 18:55:51,225 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 11800, loss[loss=0.07446, simple_loss=0.1018, pruned_loss=0.01237, audio_tagging_loss=0.01119, over 14937.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.08982, pruned_loss=0.01243, audio_tagging_loss=0.008813, over 3043914.20 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:56:04,256 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3525573.3333333335, ans=0.125 2023-11-26 18:56:10,347 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.664e+01 8.967e+01 9.583e+01 1.033e+02 1.275e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-26 18:56:14,790 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 528850 2023-11-26 18:56:32,058 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3525706.6666666665, ans=0.1 2023-11-26 18:56:34,159 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3525706.6666666665, ans=10.0 2023-11-26 18:56:37,649 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.04 vs. limit=15.0 2023-11-26 18:56:44,751 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3525773.3333333335, ans=0.2 2023-11-26 18:56:46,547 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 11850, loss[loss=0.0618, simple_loss=0.07963, pruned_loss=0.01142, audio_tagging_loss=0.01057, over 15971.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.08995, pruned_loss=0.01233, audio_tagging_loss=0.008864, over 3053853.81 frames. ], batch size: 61, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:57:00,005 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3525906.6666666665, ans=0.0 2023-11-26 18:57:00,333 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=7.99 vs. limit=12.0 2023-11-26 18:57:10,250 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3525973.3333333335, ans=0.0 2023-11-26 18:57:12,168 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 528900 2023-11-26 18:57:13,256 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3525973.3333333335, ans=0.125 2023-11-26 18:57:42,553 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 11900, loss[loss=0.04288, simple_loss=0.05403, pruned_loss=0.00713, audio_tagging_loss=0.008739, over 15167.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.08976, pruned_loss=0.01216, audio_tagging_loss=0.008938, over 3053464.73 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:57:45,441 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3526173.3333333335, ans=0.125 2023-11-26 18:57:50,692 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.01 vs. limit=15.0 2023-11-26 18:57:54,527 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3526240.0, ans=0.09899494936611666 2023-11-26 18:58:02,717 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.524e+01 8.907e+01 9.565e+01 1.014e+02 1.302e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-26 18:58:06,178 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 18:58:07,188 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 528950 2023-11-26 18:58:22,355 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3526373.3333333335, ans=0.125 2023-11-26 18:58:34,157 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3526440.0, ans=0.125 2023-11-26 18:58:39,303 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 11950, loss[loss=0.06613, simple_loss=0.08386, pruned_loss=0.0106, audio_tagging_loss=0.0136, over 14906.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.08943, pruned_loss=0.01214, audio_tagging_loss=0.008923, over 3058038.93 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:58:47,403 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.53 vs. limit=6.0 2023-11-26 18:59:02,681 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 529000 2023-11-26 18:59:04,272 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3526640.0, ans=0.125 2023-11-26 18:59:24,036 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3526773.3333333335, ans=0.125 2023-11-26 18:59:24,138 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3526773.3333333335, ans=0.125 2023-11-26 18:59:26,096 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3526773.3333333335, ans=0.125 2023-11-26 18:59:34,102 INFO [train_asr.py:1235] (0/4) Epoch 44, batch 12000, loss[loss=0.05664, simple_loss=0.07366, pruned_loss=0.01204, audio_tagging_loss=0.007768, over 15300.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.08923, pruned_loss=0.01204, audio_tagging_loss=0.009025, over 3054868.53 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 18:59:34,104 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-26 19:00:06,918 INFO [train_asr.py:1267] (0/4) Epoch 44, validation: loss=0.05801, simple_loss=0.05056, pruned_loss=0.005309, audio_tagging_loss=0.02742, over 4681554.00 frames. 2023-11-26 19:00:06,919 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-26 19:00:13,346 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3526840.0, ans=0.0 2023-11-26 19:00:25,342 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.461e+01 8.909e+01 9.466e+01 1.042e+02 1.234e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-26 19:00:29,535 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 529050 2023-11-26 19:00:35,076 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/epoch-44.pt 2023-11-26 19:01:05,879 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 0, loss[loss=0.07514, simple_loss=0.09169, pruned_loss=0.008291, audio_tagging_loss=0.02101, over 15067.00 frames. ], tot_loss[loss=0.07514, simple_loss=0.09169, pruned_loss=0.008291, audio_tagging_loss=0.02101, over 15067.00 frames. ], batch size: 55, lr: 1.51e-03, grad_scale: 32.0 2023-11-26 19:01:05,882 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-26 19:01:37,709 INFO [train_asr.py:1267] (0/4) Epoch 45, validation: loss=0.05755, simple_loss=0.05055, pruned_loss=0.005302, audio_tagging_loss=0.02697, over 4681554.00 frames. 2023-11-26 19:01:37,710 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-26 19:01:53,803 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3527080.0, ans=0.0 2023-11-26 19:01:57,030 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3527080.0, ans=0.0 2023-11-26 19:02:00,190 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3527146.6666666665, ans=0.1 2023-11-26 19:02:01,317 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3527146.6666666665, ans=0.125 2023-11-26 19:02:04,066 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3527146.6666666665, ans=0.05 2023-11-26 19:02:07,161 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3527146.6666666665, ans=0.0 2023-11-26 19:02:28,770 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 529100 2023-11-26 19:02:32,924 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 50, loss[loss=0.0655, simple_loss=0.07808, pruned_loss=0.007704, audio_tagging_loss=0.01876, over 14826.00 frames. ], tot_loss[loss=0.07136, simple_loss=0.08634, pruned_loss=0.01106, audio_tagging_loss=0.01713, over 689851.86 frames. ], batch size: 55, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:02:36,261 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3527346.6666666665, ans=0.5 2023-11-26 19:02:39,330 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=3527346.6666666665, ans=0.05 2023-11-26 19:02:44,642 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3527413.3333333335, ans=0.035 2023-11-26 19:02:49,595 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3527413.3333333335, ans=0.0 2023-11-26 19:02:58,916 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3527480.0, ans=0.125 2023-11-26 19:03:09,232 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3527546.6666666665, ans=0.0 2023-11-26 19:03:12,339 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3527546.6666666665, ans=0.1 2023-11-26 19:03:20,569 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.836e+01 9.859e+01 1.043e+02 1.139e+02 1.375e+02, threshold=2.086e+02, percent-clipped=0.0 2023-11-26 19:03:23,764 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 529150 2023-11-26 19:03:28,442 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 100, loss[loss=0.08301, simple_loss=0.1179, pruned_loss=0.01204, audio_tagging_loss=0.01203, over 16068.00 frames. ], tot_loss[loss=0.0714, simple_loss=0.08786, pruned_loss=0.01149, audio_tagging_loss=0.01598, over 1209432.08 frames. ], batch size: 57, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:03:29,673 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3527680.0, ans=0.0 2023-11-26 19:03:38,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3527746.6666666665, ans=0.035 2023-11-26 19:03:44,540 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3527746.6666666665, ans=0.0 2023-11-26 19:03:47,561 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3527746.6666666665, ans=0.125 2023-11-26 19:03:47,563 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3527746.6666666665, ans=0.125 2023-11-26 19:03:59,308 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3527813.3333333335, ans=0.125 2023-11-26 19:04:00,462 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3527880.0, ans=0.0 2023-11-26 19:04:01,675 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.59 vs. limit=22.5 2023-11-26 19:04:03,659 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 19:04:08,093 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.49 vs. limit=6.0 2023-11-26 19:04:19,248 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 529200 2023-11-26 19:04:23,706 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 150, loss[loss=0.06418, simple_loss=0.0887, pruned_loss=0.005603, audio_tagging_loss=0.01422, over 16352.00 frames. ], tot_loss[loss=0.07062, simple_loss=0.08902, pruned_loss=0.0117, audio_tagging_loss=0.01441, over 1613953.00 frames. ], batch size: 59, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:04:36,604 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.91 vs. limit=12.0 2023-11-26 19:04:41,649 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.96 vs. limit=10.0 2023-11-26 19:04:43,610 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3528080.0, ans=0.0 2023-11-26 19:04:44,560 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3528146.6666666665, ans=0.0 2023-11-26 19:04:49,968 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3528146.6666666665, ans=0.125 2023-11-26 19:04:57,018 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3528213.3333333335, ans=0.025 2023-11-26 19:04:58,698 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3528213.3333333335, ans=0.2 2023-11-26 19:05:08,212 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3528280.0, ans=0.125 2023-11-26 19:05:11,745 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.392e+01 9.212e+01 9.845e+01 1.053e+02 1.367e+02, threshold=1.969e+02, percent-clipped=0.0 2023-11-26 19:05:14,990 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 529250 2023-11-26 19:05:19,235 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 200, loss[loss=0.0748, simple_loss=0.1008, pruned_loss=0.01572, audio_tagging_loss=0.008707, over 15417.00 frames. ], tot_loss[loss=0.06976, simple_loss=0.08999, pruned_loss=0.01198, audio_tagging_loss=0.01278, over 1929449.85 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:05:19,807 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.93 vs. limit=15.0 2023-11-26 19:05:29,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3528413.3333333335, ans=0.125 2023-11-26 19:05:29,094 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3528413.3333333335, ans=0.0 2023-11-26 19:05:30,014 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3528413.3333333335, ans=0.125 2023-11-26 19:05:34,892 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3528413.3333333335, ans=0.2 2023-11-26 19:05:50,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3528480.0, ans=0.125 2023-11-26 19:05:56,620 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.57 vs. limit=15.0 2023-11-26 19:05:59,161 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3528546.6666666665, ans=0.2 2023-11-26 19:06:07,692 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3528613.3333333335, ans=0.125 2023-11-26 19:06:09,728 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 529300 2023-11-26 19:06:13,963 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 250, loss[loss=0.06651, simple_loss=0.09182, pruned_loss=0.01093, audio_tagging_loss=0.00967, over 15475.00 frames. ], tot_loss[loss=0.06917, simple_loss=0.09102, pruned_loss=0.01215, audio_tagging_loss=0.01152, over 2171355.47 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:06:15,346 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3528680.0, ans=0.125 2023-11-26 19:06:18,971 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3528680.0, ans=0.1 2023-11-26 19:06:22,847 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3528680.0, ans=0.125 2023-11-26 19:06:23,987 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.22 vs. limit=15.0 2023-11-26 19:07:00,447 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.92 vs. limit=15.0 2023-11-26 19:07:01,942 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.095e+01 9.038e+01 9.703e+01 1.049e+02 1.454e+02, threshold=1.941e+02, percent-clipped=0.0 2023-11-26 19:07:05,754 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 529350 2023-11-26 19:07:08,071 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3528946.6666666665, ans=0.2 2023-11-26 19:07:09,932 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 300, loss[loss=0.05372, simple_loss=0.07051, pruned_loss=0.007647, audio_tagging_loss=0.01082, over 15821.00 frames. ], tot_loss[loss=0.06845, simple_loss=0.09092, pruned_loss=0.01232, audio_tagging_loss=0.01067, over 2365696.04 frames. ], batch size: 60, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:07:12,610 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.40 vs. limit=15.0 2023-11-26 19:07:19,330 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3529013.3333333335, ans=0.125 2023-11-26 19:07:22,528 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3529080.0, ans=0.1 2023-11-26 19:07:27,778 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 19:07:52,462 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3529213.3333333335, ans=0.125 2023-11-26 19:07:52,980 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.62 vs. limit=15.0 2023-11-26 19:08:00,838 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 529400 2023-11-26 19:08:05,789 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 350, loss[loss=0.07779, simple_loss=0.1049, pruned_loss=0.01865, audio_tagging_loss=0.006693, over 14744.00 frames. ], tot_loss[loss=0.0681, simple_loss=0.09145, pruned_loss=0.01232, audio_tagging_loss=0.01006, over 2522749.30 frames. ], batch size: 52, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:08:24,884 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3529413.3333333335, ans=0.125 2023-11-26 19:08:27,009 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3529480.0, ans=0.125 2023-11-26 19:08:31,072 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=3529480.0, ans=15.0 2023-11-26 19:08:31,692 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3529480.0, ans=0.125 2023-11-26 19:08:53,266 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.461e+01 8.894e+01 9.566e+01 1.035e+02 1.216e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-26 19:08:56,491 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 529450 2023-11-26 19:09:00,687 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 400, loss[loss=0.06968, simple_loss=0.09083, pruned_loss=0.01419, audio_tagging_loss=0.01007, over 15113.00 frames. ], tot_loss[loss=0.06733, simple_loss=0.09078, pruned_loss=0.0123, audio_tagging_loss=0.009648, over 2641316.16 frames. ], batch size: 57, lr: 1.51e-03, grad_scale: 32.0 2023-11-26 19:09:05,727 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3529680.0, ans=0.125 2023-11-26 19:09:31,802 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3529813.3333333335, ans=0.1 2023-11-26 19:09:34,121 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.whiten.whitening_limit, batch_count=3529880.0, ans=12.0 2023-11-26 19:09:48,201 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 19:09:52,818 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 529500 2023-11-26 19:09:54,643 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3529946.6666666665, ans=0.0 2023-11-26 19:09:55,867 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.37 vs. limit=15.0 2023-11-26 19:09:57,563 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 450, loss[loss=0.05791, simple_loss=0.07074, pruned_loss=0.01182, audio_tagging_loss=0.01072, over 15049.00 frames. ], tot_loss[loss=0.06684, simple_loss=0.0905, pruned_loss=0.01222, audio_tagging_loss=0.009369, over 2730787.65 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 32.0 2023-11-26 19:10:46,554 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.397e+01 8.546e+01 9.095e+01 1.009e+02 1.358e+02, threshold=1.819e+02, percent-clipped=0.0 2023-11-26 19:10:48,798 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 529550 2023-11-26 19:10:53,035 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 500, loss[loss=0.06848, simple_loss=0.09847, pruned_loss=0.01123, audio_tagging_loss=0.008012, over 14957.00 frames. ], tot_loss[loss=0.06664, simple_loss=0.09031, pruned_loss=0.01223, audio_tagging_loss=0.009262, over 2800534.57 frames. ], batch size: 54, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:10:54,422 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3530346.6666666665, ans=0.125 2023-11-26 19:11:07,566 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3530413.3333333335, ans=0.0 2023-11-26 19:11:13,363 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3530413.3333333335, ans=0.0 2023-11-26 19:11:28,333 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3530546.6666666665, ans=0.0 2023-11-26 19:11:36,129 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.46 vs. limit=22.5 2023-11-26 19:11:44,901 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 529600 2023-11-26 19:11:46,501 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3530613.3333333335, ans=0.125 2023-11-26 19:11:48,608 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3530680.0, ans=0.0 2023-11-26 19:11:49,397 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 550, loss[loss=0.07024, simple_loss=0.09928, pruned_loss=0.01099, audio_tagging_loss=0.009618, over 15423.00 frames. ], tot_loss[loss=0.06662, simple_loss=0.09064, pruned_loss=0.01224, audio_tagging_loss=0.00906, over 2861059.59 frames. ], batch size: 58, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:12:11,834 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3530813.3333333335, ans=0.125 2023-11-26 19:12:19,237 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 19:12:25,838 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3530880.0, ans=0.2 2023-11-26 19:12:30,632 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3530880.0, ans=0.0 2023-11-26 19:12:35,932 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3530946.6666666665, ans=0.2 2023-11-26 19:12:39,403 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.657e+01 8.958e+01 9.518e+01 1.034e+02 1.414e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-26 19:12:41,688 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 529650 2023-11-26 19:12:46,986 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 600, loss[loss=0.06124, simple_loss=0.08465, pruned_loss=0.01194, audio_tagging_loss=0.006973, over 14768.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.08945, pruned_loss=0.01207, audio_tagging_loss=0.009039, over 2896217.65 frames. ], batch size: 57, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:12:52,449 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3531013.3333333335, ans=0.125 2023-11-26 19:13:08,534 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3531146.6666666665, ans=0.125 2023-11-26 19:13:10,526 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 19:13:12,204 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3531146.6666666665, ans=0.5 2023-11-26 19:13:12,247 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3531146.6666666665, ans=0.125 2023-11-26 19:13:17,486 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3531146.6666666665, ans=0.0 2023-11-26 19:13:29,262 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3531213.3333333335, ans=0.125 2023-11-26 19:13:31,612 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.86 vs. limit=15.0 2023-11-26 19:13:37,420 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 529700 2023-11-26 19:13:38,749 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3531280.0, ans=0.125 2023-11-26 19:13:41,163 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.75 vs. limit=15.0 2023-11-26 19:13:41,631 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 650, loss[loss=0.07894, simple_loss=0.1079, pruned_loss=0.01616, audio_tagging_loss=0.008842, over 14664.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.08998, pruned_loss=0.01228, audio_tagging_loss=0.008944, over 2925935.35 frames. ], batch size: 55, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:13:43,997 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3531346.6666666665, ans=0.0 2023-11-26 19:13:46,169 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 19:13:46,661 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.50 vs. limit=12.0 2023-11-26 19:13:51,401 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3531413.3333333335, ans=0.125 2023-11-26 19:14:27,915 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3531613.3333333335, ans=0.125 2023-11-26 19:14:29,928 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.866e+01 9.030e+01 9.559e+01 1.040e+02 1.405e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-26 19:14:32,206 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 529750 2023-11-26 19:14:33,427 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3531613.3333333335, ans=0.1 2023-11-26 19:14:36,331 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 700, loss[loss=0.06497, simple_loss=0.0894, pruned_loss=0.01071, audio_tagging_loss=0.009555, over 15848.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.08973, pruned_loss=0.01222, audio_tagging_loss=0.008903, over 2951553.08 frames. ], batch size: 61, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:14:41,430 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3531680.0, ans=0.0 2023-11-26 19:14:43,524 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 19:14:47,483 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.59 vs. limit=22.5 2023-11-26 19:14:55,292 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3531746.6666666665, ans=0.0 2023-11-26 19:14:57,237 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3531746.6666666665, ans=0.1 2023-11-26 19:15:20,552 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3531946.6666666665, ans=0.015 2023-11-26 19:15:27,939 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 529800 2023-11-26 19:15:32,430 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 750, loss[loss=0.07543, simple_loss=0.1083, pruned_loss=0.0115, audio_tagging_loss=0.009804, over 15014.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.09038, pruned_loss=0.0122, audio_tagging_loss=0.008851, over 2968000.98 frames. ], batch size: 58, lr: 1.51e-03, grad_scale: 8.0 2023-11-26 19:15:46,198 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.76 vs. limit=22.5 2023-11-26 19:15:52,404 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3532080.0, ans=0.1 2023-11-26 19:15:53,428 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3532146.6666666665, ans=0.125 2023-11-26 19:16:03,112 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3532146.6666666665, ans=0.2 2023-11-26 19:16:22,634 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.575e+01 8.840e+01 9.480e+01 1.009e+02 1.765e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-26 19:16:23,755 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 529850 2023-11-26 19:16:27,307 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.99 vs. limit=12.0 2023-11-26 19:16:27,892 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 800, loss[loss=0.0518, simple_loss=0.07008, pruned_loss=0.005279, audio_tagging_loss=0.01148, over 15094.00 frames. ], tot_loss[loss=0.067, simple_loss=0.09159, pruned_loss=0.01236, audio_tagging_loss=0.008845, over 2983165.55 frames. ], batch size: 55, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:16:42,315 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.79 vs. limit=10.0 2023-11-26 19:16:47,344 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3532413.3333333335, ans=0.1 2023-11-26 19:17:15,583 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3532613.3333333335, ans=0.125 2023-11-26 19:17:16,497 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3532613.3333333335, ans=0.125 2023-11-26 19:17:18,530 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 529900 2023-11-26 19:17:22,613 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 850, loss[loss=0.0784, simple_loss=0.1096, pruned_loss=0.01418, audio_tagging_loss=0.009426, over 14895.00 frames. ], tot_loss[loss=0.06709, simple_loss=0.09159, pruned_loss=0.01251, audio_tagging_loss=0.00879, over 2996379.47 frames. ], batch size: 55, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:17:32,886 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3532746.6666666665, ans=0.1 2023-11-26 19:17:38,346 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3532746.6666666665, ans=0.125 2023-11-26 19:17:39,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3532746.6666666665, ans=0.0 2023-11-26 19:18:11,998 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.322e+01 9.004e+01 9.716e+01 1.045e+02 1.656e+02, threshold=1.943e+02, percent-clipped=0.0 2023-11-26 19:18:13,157 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 529950 2023-11-26 19:18:18,470 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 900, loss[loss=0.08094, simple_loss=0.1124, pruned_loss=0.01703, audio_tagging_loss=0.007736, over 14952.00 frames. ], tot_loss[loss=0.06697, simple_loss=0.09133, pruned_loss=0.0125, audio_tagging_loss=0.008812, over 3004800.51 frames. ], batch size: 54, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:18:21,075 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.49 vs. limit=15.0 2023-11-26 19:18:24,855 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.56 vs. limit=15.0 2023-11-26 19:18:34,525 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.94 vs. limit=22.5 2023-11-26 19:18:45,950 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3533146.6666666665, ans=0.1 2023-11-26 19:19:09,839 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 530000 2023-11-26 19:19:14,245 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 950, loss[loss=0.0587, simple_loss=0.08691, pruned_loss=0.007942, audio_tagging_loss=0.007306, over 14509.00 frames. ], tot_loss[loss=0.06701, simple_loss=0.09154, pruned_loss=0.01248, audio_tagging_loss=0.00875, over 3016032.70 frames. ], batch size: 52, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:19:30,405 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3533413.3333333335, ans=0.2 2023-11-26 19:19:32,576 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3533413.3333333335, ans=0.125 2023-11-26 19:19:38,326 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.48 vs. limit=15.0 2023-11-26 19:20:03,012 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3533613.3333333335, ans=0.2 2023-11-26 19:20:03,782 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.608e+01 8.669e+01 9.436e+01 1.000e+02 1.329e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-26 19:20:04,933 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 530050 2023-11-26 19:20:09,193 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 1000, loss[loss=0.06853, simple_loss=0.08806, pruned_loss=0.01611, audio_tagging_loss=0.008388, over 15843.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.09121, pruned_loss=0.01233, audio_tagging_loss=0.008606, over 3024841.35 frames. ], batch size: 58, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:20:18,917 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3533746.6666666665, ans=0.2 2023-11-26 19:20:33,254 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 19:20:39,926 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3533813.3333333335, ans=0.2 2023-11-26 19:20:44,292 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3533880.0, ans=0.125 2023-11-26 19:21:00,079 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 530100 2023-11-26 19:21:04,782 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 1050, loss[loss=0.04922, simple_loss=0.06067, pruned_loss=0.005594, audio_tagging_loss=0.01329, over 14944.00 frames. ], tot_loss[loss=0.06664, simple_loss=0.09117, pruned_loss=0.01249, audio_tagging_loss=0.00857, over 3027003.39 frames. ], batch size: 58, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:21:12,064 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3534013.3333333335, ans=0.0 2023-11-26 19:21:23,818 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3534080.0, ans=0.125 2023-11-26 19:21:31,183 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3534146.6666666665, ans=0.0 2023-11-26 19:21:50,989 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.49 vs. limit=10.0 2023-11-26 19:21:54,699 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.578e+01 8.897e+01 9.575e+01 1.026e+02 1.368e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-26 19:21:55,824 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 530150 2023-11-26 19:21:59,998 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 1100, loss[loss=0.07336, simple_loss=0.1015, pruned_loss=0.01539, audio_tagging_loss=0.007215, over 15595.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.08993, pruned_loss=0.01232, audio_tagging_loss=0.00856, over 3026755.74 frames. ], batch size: 59, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:22:00,067 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3534346.6666666665, ans=0.125 2023-11-26 19:22:01,244 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3534346.6666666665, ans=0.0 2023-11-26 19:22:02,126 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 19:22:04,295 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3534346.6666666665, ans=0.0 2023-11-26 19:22:10,923 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3534413.3333333335, ans=0.1 2023-11-26 19:22:18,355 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3534413.3333333335, ans=0.1 2023-11-26 19:22:18,436 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.98 vs. limit=22.5 2023-11-26 19:22:22,212 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.94 vs. limit=6.0 2023-11-26 19:22:29,635 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3534480.0, ans=0.125 2023-11-26 19:22:31,784 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3534546.6666666665, ans=0.0 2023-11-26 19:22:31,895 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3534546.6666666665, ans=0.0 2023-11-26 19:22:35,606 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3534546.6666666665, ans=0.0 2023-11-26 19:22:50,890 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 530200 2023-11-26 19:22:55,386 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 1150, loss[loss=0.07331, simple_loss=0.09413, pruned_loss=0.01699, audio_tagging_loss=0.009256, over 15223.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.09059, pruned_loss=0.01228, audio_tagging_loss=0.008553, over 3030763.56 frames. ], batch size: 55, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:23:01,941 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3534680.0, ans=0.125 2023-11-26 19:23:09,887 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3534746.6666666665, ans=0.125 2023-11-26 19:23:30,231 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3534880.0, ans=0.0 2023-11-26 19:23:44,884 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.438e+01 8.885e+01 9.403e+01 1.020e+02 1.405e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-26 19:23:46,007 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 530250 2023-11-26 19:23:50,185 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 1200, loss[loss=0.06856, simple_loss=0.0957, pruned_loss=0.009731, audio_tagging_loss=0.01098, over 15164.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.09115, pruned_loss=0.01231, audio_tagging_loss=0.00847, over 3025567.09 frames. ], batch size: 57, lr: 1.51e-03, grad_scale: 32.0 2023-11-26 19:23:53,200 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 19:23:57,918 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3535013.3333333335, ans=0.04949747468305833 2023-11-26 19:23:59,505 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3535013.3333333335, ans=0.125 2023-11-26 19:24:00,546 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3535013.3333333335, ans=0.2 2023-11-26 19:24:07,757 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3535080.0, ans=0.07 2023-11-26 19:24:17,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3535146.6666666665, ans=0.125 2023-11-26 19:24:33,740 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.47 vs. limit=6.0 2023-11-26 19:24:43,518 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 530300 2023-11-26 19:24:47,701 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 1250, loss[loss=0.06071, simple_loss=0.07992, pruned_loss=0.01169, audio_tagging_loss=0.009059, over 14772.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.0907, pruned_loss=0.0123, audio_tagging_loss=0.008529, over 3027993.37 frames. ], batch size: 57, lr: 1.51e-03, grad_scale: 32.0 2023-11-26 19:25:02,254 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3535413.3333333335, ans=0.125 2023-11-26 19:25:06,507 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3535413.3333333335, ans=0.0 2023-11-26 19:25:23,606 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3535546.6666666665, ans=0.0 2023-11-26 19:25:38,005 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3535613.3333333335, ans=0.125 2023-11-26 19:25:38,826 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.083e+01 8.830e+01 9.575e+01 1.052e+02 2.949e+02, threshold=1.915e+02, percent-clipped=1.0 2023-11-26 19:25:38,938 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 530350 2023-11-26 19:25:43,052 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 1300, loss[loss=0.07364, simple_loss=0.09477, pruned_loss=0.01608, audio_tagging_loss=0.01017, over 14359.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.09052, pruned_loss=0.01241, audio_tagging_loss=0.008551, over 3022946.23 frames. ], batch size: 54, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:26:07,793 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3535813.3333333335, ans=0.09899494936611666 2023-11-26 19:26:23,225 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3535880.0, ans=0.2 2023-11-26 19:26:33,584 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 530400 2023-11-26 19:26:38,073 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 1350, loss[loss=0.06765, simple_loss=0.08189, pruned_loss=0.0128, audio_tagging_loss=0.0139, over 15491.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.08999, pruned_loss=0.01233, audio_tagging_loss=0.008593, over 3029409.37 frames. ], batch size: 60, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:26:40,703 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=3536013.3333333335, ans=6.0 2023-11-26 19:26:43,391 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.02 vs. limit=15.0 2023-11-26 19:26:46,576 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.20 vs. limit=10.0 2023-11-26 19:26:51,104 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3536080.0, ans=0.0 2023-11-26 19:27:18,137 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 19:27:30,510 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.624e+01 8.761e+01 9.344e+01 1.009e+02 1.308e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-26 19:27:30,605 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 530450 2023-11-26 19:27:34,946 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 1400, loss[loss=0.0654, simple_loss=0.09091, pruned_loss=0.01208, audio_tagging_loss=0.007867, over 14343.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08975, pruned_loss=0.01219, audio_tagging_loss=0.008708, over 3041988.46 frames. ], batch size: 53, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:27:35,091 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3536346.6666666665, ans=0.1 2023-11-26 19:27:53,192 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.43 vs. limit=15.0 2023-11-26 19:27:55,990 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3536480.0, ans=0.125 2023-11-26 19:28:08,322 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.22 vs. limit=15.0 2023-11-26 19:28:21,053 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3536613.3333333335, ans=0.025 2023-11-26 19:28:22,668 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.12 vs. limit=5.0 2023-11-26 19:28:26,108 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 530500 2023-11-26 19:28:30,898 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 1450, loss[loss=0.06477, simple_loss=0.0936, pruned_loss=0.01103, audio_tagging_loss=0.006939, over 15301.00 frames. ], tot_loss[loss=0.06676, simple_loss=0.09123, pruned_loss=0.01238, audio_tagging_loss=0.008757, over 3049854.93 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:28:33,220 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=3536680.0, ans=0.1 2023-11-26 19:28:48,079 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff3.min_abs, batch_count=3536746.6666666665, ans=0.2 2023-11-26 19:28:48,197 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3536746.6666666665, ans=0.0 2023-11-26 19:29:15,084 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.10 vs. limit=15.0 2023-11-26 19:29:22,143 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.568e+01 8.955e+01 9.646e+01 1.028e+02 1.353e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-26 19:29:22,235 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 530550 2023-11-26 19:29:26,588 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 1500, loss[loss=0.05001, simple_loss=0.0594, pruned_loss=0.01085, audio_tagging_loss=0.009463, over 14751.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.09065, pruned_loss=0.01237, audio_tagging_loss=0.008854, over 3051173.89 frames. ], batch size: 58, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:29:37,708 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.81 vs. limit=15.0 2023-11-26 19:29:45,369 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3537080.0, ans=0.035 2023-11-26 19:29:46,443 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3537080.0, ans=0.0 2023-11-26 19:29:48,669 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3537146.6666666665, ans=0.09899494936611666 2023-11-26 19:29:49,778 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3537146.6666666665, ans=0.5 2023-11-26 19:29:56,635 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3537146.6666666665, ans=0.0 2023-11-26 19:30:03,280 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.35 vs. limit=15.0 2023-11-26 19:30:05,245 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3537213.3333333335, ans=0.125 2023-11-26 19:30:18,377 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 530600 2023-11-26 19:30:23,558 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 1550, loss[loss=0.05603, simple_loss=0.07791, pruned_loss=0.008664, audio_tagging_loss=0.008411, over 14264.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.09093, pruned_loss=0.01221, audio_tagging_loss=0.008887, over 3059520.85 frames. ], batch size: 55, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:30:31,664 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.42 vs. limit=15.0 2023-11-26 19:31:03,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3537546.6666666665, ans=0.035 2023-11-26 19:31:06,011 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3537546.6666666665, ans=0.04949747468305833 2023-11-26 19:31:12,231 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3537613.3333333335, ans=0.0 2023-11-26 19:31:13,496 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3537613.3333333335, ans=0.1 2023-11-26 19:31:14,236 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.522e+01 9.127e+01 9.575e+01 1.042e+02 1.304e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-26 19:31:14,325 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 530650 2023-11-26 19:31:18,472 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 1600, loss[loss=0.07929, simple_loss=0.1109, pruned_loss=0.01327, audio_tagging_loss=0.01057, over 14817.00 frames. ], tot_loss[loss=0.0669, simple_loss=0.09153, pruned_loss=0.01228, audio_tagging_loss=0.008863, over 3065350.14 frames. ], batch size: 55, lr: 1.51e-03, grad_scale: 32.0 2023-11-26 19:31:18,697 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3537680.0, ans=0.125 2023-11-26 19:31:20,937 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3537680.0, ans=0.2 2023-11-26 19:31:26,673 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3537680.0, ans=0.125 2023-11-26 19:31:42,113 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3537813.3333333335, ans=0.1 2023-11-26 19:31:56,181 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3537880.0, ans=0.125 2023-11-26 19:32:10,434 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 530700 2023-11-26 19:32:14,670 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 1650, loss[loss=0.07848, simple_loss=0.1107, pruned_loss=0.01546, audio_tagging_loss=0.007646, over 15053.00 frames. ], tot_loss[loss=0.06657, simple_loss=0.0909, pruned_loss=0.01218, audio_tagging_loss=0.00895, over 3068697.41 frames. ], batch size: 54, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:32:31,770 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.66 vs. limit=15.0 2023-11-26 19:32:31,809 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.34 vs. limit=15.0 2023-11-26 19:32:33,604 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3538080.0, ans=0.1 2023-11-26 19:32:35,699 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3538080.0, ans=0.125 2023-11-26 19:32:49,959 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.39 vs. limit=10.0 2023-11-26 19:32:59,634 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3538280.0, ans=0.125 2023-11-26 19:33:03,113 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.01 vs. limit=15.0 2023-11-26 19:33:06,441 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 530750 2023-11-26 19:33:08,490 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.083e+01 8.661e+01 9.270e+01 1.008e+02 1.328e+02, threshold=1.854e+02, percent-clipped=0.0 2023-11-26 19:33:11,686 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 1700, loss[loss=0.06409, simple_loss=0.08666, pruned_loss=0.01294, audio_tagging_loss=0.007818, over 16128.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.0905, pruned_loss=0.01221, audio_tagging_loss=0.008951, over 3065408.98 frames. ], batch size: 62, lr: 1.51e-03, grad_scale: 8.0 2023-11-26 19:33:26,259 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.02 vs. limit=15.0 2023-11-26 19:33:26,784 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3538413.3333333335, ans=0.0 2023-11-26 19:33:29,959 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3538413.3333333335, ans=0.125 2023-11-26 19:33:32,454 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.28 vs. limit=15.0 2023-11-26 19:33:40,192 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3538480.0, ans=0.1 2023-11-26 19:34:03,038 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 530800 2023-11-26 19:34:06,794 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3538680.0, ans=0.1 2023-11-26 19:34:07,596 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 1750, loss[loss=0.06312, simple_loss=0.0919, pruned_loss=0.01167, audio_tagging_loss=0.005502, over 14322.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.09041, pruned_loss=0.01207, audio_tagging_loss=0.008901, over 3068323.07 frames. ], batch size: 53, lr: 1.51e-03, grad_scale: 8.0 2023-11-26 19:34:10,846 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3538680.0, ans=0.125 2023-11-26 19:34:12,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3538680.0, ans=0.0 2023-11-26 19:34:15,684 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.69 vs. limit=15.0 2023-11-26 19:34:28,497 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3538813.3333333335, ans=0.125 2023-11-26 19:34:37,640 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3538813.3333333335, ans=0.125 2023-11-26 19:34:58,827 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 530850 2023-11-26 19:35:00,912 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.543e+01 9.072e+01 9.578e+01 1.039e+02 1.393e+02, threshold=1.916e+02, percent-clipped=0.0 2023-11-26 19:35:01,130 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3538946.6666666665, ans=0.125 2023-11-26 19:35:03,572 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 1800, loss[loss=0.06324, simple_loss=0.08946, pruned_loss=0.01248, audio_tagging_loss=0.006037, over 15136.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.09045, pruned_loss=0.01204, audio_tagging_loss=0.008774, over 3055052.49 frames. ], batch size: 55, lr: 1.51e-03, grad_scale: 8.0 2023-11-26 19:35:10,377 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3539013.3333333335, ans=0.05 2023-11-26 19:35:26,914 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3539146.6666666665, ans=0.125 2023-11-26 19:35:32,316 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 19:35:34,339 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3539146.6666666665, ans=0.125 2023-11-26 19:35:51,065 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3539280.0, ans=0.1 2023-11-26 19:35:55,088 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 530900 2023-11-26 19:35:59,859 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 1850, loss[loss=0.07334, simple_loss=0.09948, pruned_loss=0.01346, audio_tagging_loss=0.01014, over 16103.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.09025, pruned_loss=0.01199, audio_tagging_loss=0.008748, over 3056030.91 frames. ], batch size: 59, lr: 1.51e-03, grad_scale: 8.0 2023-11-26 19:36:24,981 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.68 vs. limit=10.0 2023-11-26 19:36:30,456 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3539480.0, ans=0.0 2023-11-26 19:36:32,593 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3539546.6666666665, ans=0.1 2023-11-26 19:36:39,391 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3539546.6666666665, ans=0.07 2023-11-26 19:36:47,602 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.81 vs. limit=10.0 2023-11-26 19:36:48,372 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3539613.3333333335, ans=0.07 2023-11-26 19:36:48,393 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3539613.3333333335, ans=0.07 2023-11-26 19:36:51,406 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 530950 2023-11-26 19:36:53,509 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.468e+01 8.894e+01 9.427e+01 1.010e+02 7.555e+02, threshold=1.885e+02, percent-clipped=1.0 2023-11-26 19:36:55,635 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 1900, loss[loss=0.04896, simple_loss=0.06104, pruned_loss=0.00859, audio_tagging_loss=0.009849, over 14650.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.09046, pruned_loss=0.012, audio_tagging_loss=0.008708, over 3053484.13 frames. ], batch size: 58, lr: 1.51e-03, grad_scale: 8.0 2023-11-26 19:36:55,861 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3539680.0, ans=0.125 2023-11-26 19:37:19,907 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3539813.3333333335, ans=0.1 2023-11-26 19:37:27,343 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3539813.3333333335, ans=0.125 2023-11-26 19:37:37,996 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3539880.0, ans=0.125 2023-11-26 19:37:46,345 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 531000 2023-11-26 19:37:50,783 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 1950, loss[loss=0.06267, simple_loss=0.09443, pruned_loss=0.009, audio_tagging_loss=0.006454, over 14743.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08992, pruned_loss=0.01193, audio_tagging_loss=0.008577, over 3048914.12 frames. ], batch size: 54, lr: 1.51e-03, grad_scale: 8.0 2023-11-26 19:37:53,099 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3540013.3333333335, ans=0.125 2023-11-26 19:37:55,407 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3540013.3333333335, ans=0.125 2023-11-26 19:38:05,407 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.52 vs. limit=22.5 2023-11-26 19:38:26,106 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.51 vs. limit=6.0 2023-11-26 19:38:26,285 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.14 vs. limit=15.0 2023-11-26 19:38:30,973 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3540213.3333333335, ans=0.125 2023-11-26 19:38:42,143 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 531050 2023-11-26 19:38:44,172 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.825e+01 8.819e+01 9.302e+01 9.928e+01 1.179e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-26 19:38:46,887 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 2000, loss[loss=0.06946, simple_loss=0.1031, pruned_loss=0.0111, audio_tagging_loss=0.006798, over 15328.00 frames. ], tot_loss[loss=0.06454, simple_loss=0.0885, pruned_loss=0.01171, audio_tagging_loss=0.008579, over 3047867.19 frames. ], batch size: 55, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:38:57,566 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.73 vs. limit=15.0 2023-11-26 19:39:12,377 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3540480.0, ans=0.125 2023-11-26 19:39:29,716 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.72 vs. limit=12.0 2023-11-26 19:39:31,614 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3540613.3333333335, ans=0.0 2023-11-26 19:39:38,327 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 531100 2023-11-26 19:39:39,787 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.24 vs. limit=15.0 2023-11-26 19:39:42,511 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 2050, loss[loss=0.09007, simple_loss=0.1179, pruned_loss=0.0227, audio_tagging_loss=0.008414, over 16973.00 frames. ], tot_loss[loss=0.06471, simple_loss=0.08848, pruned_loss=0.01184, audio_tagging_loss=0.008629, over 3051538.50 frames. ], batch size: 61, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:39:44,793 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3540680.0, ans=0.125 2023-11-26 19:39:58,233 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.83 vs. limit=15.0 2023-11-26 19:40:18,306 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3540880.0, ans=0.125 2023-11-26 19:40:20,576 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3540880.0, ans=0.1 2023-11-26 19:40:33,205 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 531150 2023-11-26 19:40:35,281 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.661e+01 8.849e+01 9.497e+01 1.034e+02 1.158e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-26 19:40:37,436 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 2100, loss[loss=0.06309, simple_loss=0.0977, pruned_loss=0.008799, audio_tagging_loss=0.005442, over 15132.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08946, pruned_loss=0.01211, audio_tagging_loss=0.008552, over 3054193.50 frames. ], batch size: 54, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:40:46,178 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3541013.3333333335, ans=0.05 2023-11-26 19:40:46,267 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3541013.3333333335, ans=0.2 2023-11-26 19:40:59,117 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3541080.0, ans=0.2 2023-11-26 19:41:01,263 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3541146.6666666665, ans=0.125 2023-11-26 19:41:12,954 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3541213.3333333335, ans=0.125 2023-11-26 19:41:14,521 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.88 vs. limit=22.5 2023-11-26 19:41:18,353 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3541213.3333333335, ans=0.0 2023-11-26 19:41:22,545 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3541280.0, ans=0.2 2023-11-26 19:41:28,838 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 531200 2023-11-26 19:41:32,620 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3541346.6666666665, ans=0.1 2023-11-26 19:41:32,778 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.70 vs. limit=22.5 2023-11-26 19:41:32,800 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.31 vs. limit=22.5 2023-11-26 19:41:33,344 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 2150, loss[loss=0.05903, simple_loss=0.08223, pruned_loss=0.007893, audio_tagging_loss=0.01002, over 16166.00 frames. ], tot_loss[loss=0.06531, simple_loss=0.08928, pruned_loss=0.01204, audio_tagging_loss=0.008626, over 3057663.23 frames. ], batch size: 60, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:41:50,104 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3541413.3333333335, ans=0.125 2023-11-26 19:41:52,239 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3541413.3333333335, ans=0.2 2023-11-26 19:41:52,479 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.73 vs. limit=22.5 2023-11-26 19:41:53,375 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3541413.3333333335, ans=0.125 2023-11-26 19:41:56,552 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3541480.0, ans=0.125 2023-11-26 19:41:58,593 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3541480.0, ans=0.1 2023-11-26 19:42:04,444 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.80 vs. limit=15.0 2023-11-26 19:42:06,941 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 19:42:11,438 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3541546.6666666665, ans=0.0 2023-11-26 19:42:26,170 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 531250 2023-11-26 19:42:28,255 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.511e+01 8.897e+01 9.501e+01 1.024e+02 1.712e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-26 19:42:30,383 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 2200, loss[loss=0.07604, simple_loss=0.1009, pruned_loss=0.01605, audio_tagging_loss=0.009553, over 15285.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.08973, pruned_loss=0.01231, audio_tagging_loss=0.008636, over 3050714.30 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:43:01,997 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3541880.0, ans=0.0 2023-11-26 19:43:16,470 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3541946.6666666665, ans=0.1 2023-11-26 19:43:18,574 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3541946.6666666665, ans=0.125 2023-11-26 19:43:19,697 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3541946.6666666665, ans=0.125 2023-11-26 19:43:21,584 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 531300 2023-11-26 19:43:25,855 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 2250, loss[loss=0.05446, simple_loss=0.07265, pruned_loss=0.007662, audio_tagging_loss=0.01047, over 15942.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08911, pruned_loss=0.01231, audio_tagging_loss=0.008677, over 3048869.60 frames. ], batch size: 59, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:43:32,472 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3542013.3333333335, ans=0.125 2023-11-26 19:43:40,034 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3542080.0, ans=0.125 2023-11-26 19:43:45,299 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.75 vs. limit=15.0 2023-11-26 19:44:00,360 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 19:44:09,913 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3542280.0, ans=0.125 2023-11-26 19:44:13,557 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.46 vs. limit=15.0 2023-11-26 19:44:16,472 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3542280.0, ans=0.125 2023-11-26 19:44:17,343 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 531350 2023-11-26 19:44:17,507 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3542280.0, ans=0.0 2023-11-26 19:44:19,358 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.901e+01 8.880e+01 9.630e+01 1.027e+02 2.263e+02, threshold=1.926e+02, percent-clipped=1.0 2023-11-26 19:44:20,995 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.94 vs. limit=10.0 2023-11-26 19:44:21,511 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 2300, loss[loss=0.07224, simple_loss=0.1016, pruned_loss=0.01289, audio_tagging_loss=0.008538, over 15241.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.08933, pruned_loss=0.01246, audio_tagging_loss=0.008634, over 3047581.22 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:44:38,841 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3542413.3333333335, ans=0.125 2023-11-26 19:44:40,965 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3542413.3333333335, ans=0.2 2023-11-26 19:44:49,335 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3542480.0, ans=0.1 2023-11-26 19:44:50,787 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.72 vs. limit=22.5 2023-11-26 19:45:03,522 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.90 vs. limit=15.0 2023-11-26 19:45:03,859 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.46 vs. limit=12.0 2023-11-26 19:45:11,705 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 19:45:14,417 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 531400 2023-11-26 19:45:14,762 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.96 vs. limit=15.0 2023-11-26 19:45:18,981 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 2350, loss[loss=0.07509, simple_loss=0.1038, pruned_loss=0.01427, audio_tagging_loss=0.008925, over 14262.00 frames. ], tot_loss[loss=0.06502, simple_loss=0.08821, pruned_loss=0.01214, audio_tagging_loss=0.00878, over 3047293.37 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:45:33,793 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3542746.6666666665, ans=0.0 2023-11-26 19:45:41,229 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3542813.3333333335, ans=0.125 2023-11-26 19:45:43,453 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3542813.3333333335, ans=0.2 2023-11-26 19:45:56,081 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3542880.0, ans=0.0 2023-11-26 19:46:06,664 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3542946.6666666665, ans=0.025 2023-11-26 19:46:10,731 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 531450 2023-11-26 19:46:12,739 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.423e+01 8.878e+01 9.481e+01 9.940e+01 1.139e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-26 19:46:14,883 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 2400, loss[loss=0.07818, simple_loss=0.1207, pruned_loss=0.01245, audio_tagging_loss=0.005397, over 15660.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.08829, pruned_loss=0.01211, audio_tagging_loss=0.00882, over 3043720.84 frames. ], batch size: 55, lr: 1.51e-03, grad_scale: 32.0 2023-11-26 19:46:19,363 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3543013.3333333335, ans=0.2 2023-11-26 19:46:22,915 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.97 vs. limit=15.0 2023-11-26 19:46:23,675 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=3543013.3333333335, ans=0.025 2023-11-26 19:46:27,907 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3543080.0, ans=0.125 2023-11-26 19:46:44,222 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3543146.6666666665, ans=0.125 2023-11-26 19:47:05,837 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 531500 2023-11-26 19:47:07,094 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3543280.0, ans=0.1 2023-11-26 19:47:10,002 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 2450, loss[loss=0.05376, simple_loss=0.07212, pruned_loss=0.008763, audio_tagging_loss=0.008935, over 15016.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.0893, pruned_loss=0.01214, audio_tagging_loss=0.008865, over 3050853.73 frames. ], batch size: 57, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:47:19,777 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3543346.6666666665, ans=0.125 2023-11-26 19:47:26,943 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.60 vs. limit=6.0 2023-11-26 19:47:32,576 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3543480.0, ans=0.0 2023-11-26 19:47:41,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=3543480.0, ans=10.0 2023-11-26 19:48:02,279 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 531550 2023-11-26 19:48:05,932 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.504e+01 8.711e+01 9.383e+01 9.958e+01 1.270e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-26 19:48:06,237 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3543680.0, ans=0.125 2023-11-26 19:48:07,072 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 2500, loss[loss=0.06315, simple_loss=0.08539, pruned_loss=0.009731, audio_tagging_loss=0.01073, over 15543.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.0887, pruned_loss=0.0121, audio_tagging_loss=0.008931, over 3052808.22 frames. ], batch size: 59, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:48:17,350 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3543746.6666666665, ans=0.0 2023-11-26 19:48:31,461 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3543813.3333333335, ans=0.1 2023-11-26 19:48:32,447 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3543813.3333333335, ans=0.0 2023-11-26 19:48:37,184 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.10 vs. limit=10.0 2023-11-26 19:48:58,200 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 531600 2023-11-26 19:49:03,308 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 2550, loss[loss=0.0766, simple_loss=0.1044, pruned_loss=0.01714, audio_tagging_loss=0.007256, over 15147.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.08892, pruned_loss=0.01218, audio_tagging_loss=0.00887, over 3053671.94 frames. ], batch size: 58, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:49:07,824 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3544013.3333333335, ans=0.125 2023-11-26 19:49:12,362 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=3544013.3333333335, ans=6.0 2023-11-26 19:49:22,490 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3544080.0, ans=0.1 2023-11-26 19:49:25,432 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3544146.6666666665, ans=0.1 2023-11-26 19:49:43,787 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3544213.3333333335, ans=0.1 2023-11-26 19:49:48,125 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3544280.0, ans=0.0 2023-11-26 19:49:49,072 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3544280.0, ans=0.1 2023-11-26 19:49:54,034 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.73 vs. limit=22.5 2023-11-26 19:49:54,501 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 531650 2023-11-26 19:49:57,593 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.308e+01 8.904e+01 9.624e+01 1.038e+02 1.426e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-26 19:49:58,729 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 2600, loss[loss=0.07395, simple_loss=0.1025, pruned_loss=0.01463, audio_tagging_loss=0.008084, over 16069.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.08956, pruned_loss=0.01222, audio_tagging_loss=0.008703, over 3051365.34 frames. ], batch size: 61, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 19:50:14,547 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3544413.3333333335, ans=0.1 2023-11-26 19:50:18,868 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3544413.3333333335, ans=0.1 2023-11-26 19:50:21,969 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3544480.0, ans=0.0 2023-11-26 19:50:37,567 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3544546.6666666665, ans=0.125 2023-11-26 19:50:49,414 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3544613.3333333335, ans=0.2 2023-11-26 19:50:50,917 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 531700 2023-11-26 19:50:52,648 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3544613.3333333335, ans=0.2 2023-11-26 19:50:56,174 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 2650, loss[loss=0.0562, simple_loss=0.08112, pruned_loss=0.008336, audio_tagging_loss=0.007304, over 14861.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.08949, pruned_loss=0.01215, audio_tagging_loss=0.008664, over 3049212.81 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 19:51:33,758 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3544880.0, ans=0.07 2023-11-26 19:51:45,491 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3544946.6666666665, ans=0.0 2023-11-26 19:51:47,371 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 531750 2023-11-26 19:51:50,500 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.276e+01 8.824e+01 9.475e+01 1.010e+02 1.281e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-26 19:51:51,587 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 2700, loss[loss=0.06908, simple_loss=0.09448, pruned_loss=0.01327, audio_tagging_loss=0.00857, over 15425.00 frames. ], tot_loss[loss=0.06461, simple_loss=0.08791, pruned_loss=0.01196, audio_tagging_loss=0.008691, over 3046994.28 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 19:51:58,696 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3545013.3333333335, ans=0.125 2023-11-26 19:52:18,746 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3545146.6666666665, ans=0.2 2023-11-26 19:52:22,282 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3545146.6666666665, ans=0.0 2023-11-26 19:52:23,315 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3545146.6666666665, ans=0.125 2023-11-26 19:52:23,440 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3545146.6666666665, ans=0.0 2023-11-26 19:52:34,927 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3545280.0, ans=0.125 2023-11-26 19:52:38,205 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3545280.0, ans=0.2 2023-11-26 19:52:38,289 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3545280.0, ans=0.95 2023-11-26 19:52:40,027 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3545280.0, ans=0.0 2023-11-26 19:52:42,952 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 531800 2023-11-26 19:52:44,862 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.65 vs. limit=10.0 2023-11-26 19:52:47,356 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 2750, loss[loss=0.0573, simple_loss=0.08297, pruned_loss=0.009446, audio_tagging_loss=0.006367, over 15253.00 frames. ], tot_loss[loss=0.06433, simple_loss=0.08746, pruned_loss=0.01194, audio_tagging_loss=0.008654, over 3041557.45 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 19:53:03,976 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn1.whiten.whitening_limit, batch_count=3545413.3333333335, ans=22.5 2023-11-26 19:53:07,685 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3545413.3333333335, ans=0.07 2023-11-26 19:53:18,380 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3545480.0, ans=0.125 2023-11-26 19:53:19,517 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3545546.6666666665, ans=0.0 2023-11-26 19:53:20,381 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3545546.6666666665, ans=0.125 2023-11-26 19:53:27,495 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3545546.6666666665, ans=0.2 2023-11-26 19:53:34,580 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 19:53:37,781 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 531850 2023-11-26 19:53:42,031 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.836e+01 8.910e+01 9.547e+01 1.021e+02 1.473e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-26 19:53:43,112 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 2800, loss[loss=0.06077, simple_loss=0.08473, pruned_loss=0.01233, audio_tagging_loss=0.00608, over 15754.00 frames. ], tot_loss[loss=0.06442, simple_loss=0.0876, pruned_loss=0.01193, audio_tagging_loss=0.008694, over 3044264.69 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 19:53:50,216 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3545680.0, ans=0.125 2023-11-26 19:54:13,202 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3545813.3333333335, ans=0.0 2023-11-26 19:54:17,431 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3545880.0, ans=0.0 2023-11-26 19:54:34,814 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 531900 2023-11-26 19:54:37,190 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3545946.6666666665, ans=0.0 2023-11-26 19:54:38,112 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3546013.3333333335, ans=0.125 2023-11-26 19:54:38,998 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 2850, loss[loss=0.05837, simple_loss=0.07623, pruned_loss=0.01235, audio_tagging_loss=0.007904, over 14072.00 frames. ], tot_loss[loss=0.0647, simple_loss=0.08826, pruned_loss=0.01198, audio_tagging_loss=0.0086, over 3044692.37 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 19:54:43,393 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3546013.3333333335, ans=0.0 2023-11-26 19:55:13,511 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=3546213.3333333335, ans=0.025 2023-11-26 19:55:23,103 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3546280.0, ans=0.2 2023-11-26 19:55:27,432 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3546280.0, ans=0.125 2023-11-26 19:55:30,548 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 531950 2023-11-26 19:55:30,614 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3546280.0, ans=0.1 2023-11-26 19:55:34,713 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.532e+01 8.835e+01 9.315e+01 9.874e+01 1.722e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-26 19:55:34,741 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 2900, loss[loss=0.07673, simple_loss=0.1001, pruned_loss=0.01766, audio_tagging_loss=0.009024, over 16000.00 frames. ], tot_loss[loss=0.06431, simple_loss=0.08748, pruned_loss=0.0119, audio_tagging_loss=0.008669, over 3040739.39 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 19:55:38,754 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3546346.6666666665, ans=0.0 2023-11-26 19:55:53,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3546413.3333333335, ans=0.125 2023-11-26 19:55:59,511 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3546480.0, ans=0.1 2023-11-26 19:56:11,852 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.52 vs. limit=15.0 2023-11-26 19:56:21,354 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3546613.3333333335, ans=0.125 2023-11-26 19:56:26,186 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.86 vs. limit=22.5 2023-11-26 19:56:26,678 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 532000 2023-11-26 19:56:28,015 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-532000.pt 2023-11-26 19:56:33,908 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 2950, loss[loss=0.06152, simple_loss=0.08827, pruned_loss=0.00977, audio_tagging_loss=0.007619, over 15176.00 frames. ], tot_loss[loss=0.06484, simple_loss=0.08847, pruned_loss=0.01198, audio_tagging_loss=0.008621, over 3046239.14 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 19:56:40,501 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3546680.0, ans=0.0 2023-11-26 19:56:45,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3546746.6666666665, ans=0.05 2023-11-26 19:56:47,177 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.26 vs. limit=15.0 2023-11-26 19:56:52,640 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.50 vs. limit=15.0 2023-11-26 19:57:25,320 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3546946.6666666665, ans=0.125 2023-11-26 19:57:26,198 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 532050 2023-11-26 19:57:30,291 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.646e+01 8.991e+01 9.608e+01 1.049e+02 1.344e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-26 19:57:30,321 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 3000, loss[loss=0.05732, simple_loss=0.06612, pruned_loss=0.01084, audio_tagging_loss=0.01342, over 15969.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.08945, pruned_loss=0.01215, audio_tagging_loss=0.008666, over 3045766.76 frames. ], batch size: 63, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 19:57:30,323 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-26 19:58:03,035 INFO [train_asr.py:1267] (0/4) Epoch 45, validation: loss=0.05745, simple_loss=0.05048, pruned_loss=0.005228, audio_tagging_loss=0.02698, over 4681554.00 frames. 2023-11-26 19:58:03,036 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-26 19:58:07,902 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.11 vs. limit=15.0 2023-11-26 19:58:10,802 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3547013.3333333335, ans=0.0 2023-11-26 19:58:53,753 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.99 vs. limit=12.0 2023-11-26 19:58:54,316 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 532100 2023-11-26 19:59:00,138 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 3050, loss[loss=0.07051, simple_loss=0.09466, pruned_loss=0.01611, audio_tagging_loss=0.007075, over 13339.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.08953, pruned_loss=0.01221, audio_tagging_loss=0.008755, over 3042463.37 frames. ], batch size: 51, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 19:59:00,335 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3547346.6666666665, ans=0.125 2023-11-26 19:59:01,364 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3547346.6666666665, ans=0.125 2023-11-26 19:59:03,561 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3547346.6666666665, ans=0.125 2023-11-26 19:59:12,601 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.94 vs. limit=22.5 2023-11-26 19:59:20,783 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3547480.0, ans=0.125 2023-11-26 19:59:23,939 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 19:59:31,832 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 19:59:51,777 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 532150 2023-11-26 19:59:55,898 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.630e+01 8.969e+01 9.484e+01 1.021e+02 1.234e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-26 19:59:55,929 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 3100, loss[loss=0.06266, simple_loss=0.08715, pruned_loss=0.01239, audio_tagging_loss=0.0067, over 15093.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.09027, pruned_loss=0.0124, audio_tagging_loss=0.008653, over 3046978.92 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:00:02,643 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3547680.0, ans=0.2 2023-11-26 20:00:05,972 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.51 vs. limit=15.0 2023-11-26 20:00:06,685 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3547746.6666666665, ans=0.0 2023-11-26 20:00:06,784 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3547746.6666666665, ans=0.125 2023-11-26 20:00:31,398 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3547880.0, ans=0.0 2023-11-26 20:00:47,252 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 532200 2023-11-26 20:00:50,144 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.38 vs. limit=22.5 2023-11-26 20:00:51,692 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 3150, loss[loss=0.07114, simple_loss=0.1038, pruned_loss=0.01049, audio_tagging_loss=0.008742, over 14537.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.09045, pruned_loss=0.01231, audio_tagging_loss=0.00869, over 3050816.52 frames. ], batch size: 54, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:00:54,180 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3548013.3333333335, ans=0.0 2023-11-26 20:01:11,072 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 20:01:28,122 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3548213.3333333335, ans=0.125 2023-11-26 20:01:32,431 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3548213.3333333335, ans=0.125 2023-11-26 20:01:43,572 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 532250 2023-11-26 20:01:47,471 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3548346.6666666665, ans=0.125 2023-11-26 20:01:48,384 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 3200, loss[loss=0.07092, simple_loss=0.1014, pruned_loss=0.01145, audio_tagging_loss=0.008771, over 13755.00 frames. ], tot_loss[loss=0.06673, simple_loss=0.09123, pruned_loss=0.01235, audio_tagging_loss=0.008765, over 3051965.79 frames. ], batch size: 52, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:01:49,928 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.921e+01 8.870e+01 9.654e+01 1.076e+02 1.284e+02, threshold=1.931e+02, percent-clipped=0.0 2023-11-26 20:01:51,977 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.40 vs. limit=15.0 2023-11-26 20:01:54,981 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3548346.6666666665, ans=0.0 2023-11-26 20:02:16,160 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3548480.0, ans=0.125 2023-11-26 20:02:23,733 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3548546.6666666665, ans=0.1 2023-11-26 20:02:25,310 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3548546.6666666665, ans=0.125 2023-11-26 20:02:31,332 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 20:02:35,526 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3548613.3333333335, ans=0.0 2023-11-26 20:02:40,676 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 532300 2023-11-26 20:02:41,892 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3548613.3333333335, ans=0.125 2023-11-26 20:02:42,314 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.74 vs. limit=15.0 2023-11-26 20:02:44,875 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 3250, loss[loss=0.05197, simple_loss=0.06137, pruned_loss=0.01012, audio_tagging_loss=0.01116, over 14402.00 frames. ], tot_loss[loss=0.06662, simple_loss=0.09094, pruned_loss=0.0123, audio_tagging_loss=0.008839, over 3051690.16 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:02:53,602 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3548680.0, ans=0.0 2023-11-26 20:02:54,699 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3548746.6666666665, ans=0.0 2023-11-26 20:03:15,428 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3548813.3333333335, ans=0.125 2023-11-26 20:03:21,203 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 20:03:35,899 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 532350 2023-11-26 20:03:37,081 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3548946.6666666665, ans=0.125 2023-11-26 20:03:40,040 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 3300, loss[loss=0.07126, simple_loss=0.1023, pruned_loss=0.01344, audio_tagging_loss=0.00666, over 15818.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.09028, pruned_loss=0.01216, audio_tagging_loss=0.008952, over 3047219.89 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:03:41,085 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.562e+01 8.931e+01 9.545e+01 1.032e+02 1.663e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-26 20:03:55,245 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3549080.0, ans=0.125 2023-11-26 20:04:19,266 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3549213.3333333335, ans=0.125 2023-11-26 20:04:30,994 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 532400 2023-11-26 20:04:35,383 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 3350, loss[loss=0.05673, simple_loss=0.0749, pruned_loss=0.01099, audio_tagging_loss=0.008294, over 15386.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.0901, pruned_loss=0.01218, audio_tagging_loss=0.008814, over 3044733.71 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 20:05:00,353 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3549480.0, ans=0.1 2023-11-26 20:05:02,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3549480.0, ans=0.125 2023-11-26 20:05:27,165 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 532450 2023-11-26 20:05:31,384 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 3400, loss[loss=0.0687, simple_loss=0.09833, pruned_loss=0.01392, audio_tagging_loss=0.005622, over 15014.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.09047, pruned_loss=0.01233, audio_tagging_loss=0.008613, over 3042425.53 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 20:05:33,490 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.319e+01 8.780e+01 9.356e+01 1.019e+02 1.296e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-26 20:05:45,390 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3549746.6666666665, ans=0.0 2023-11-26 20:05:56,462 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.93 vs. limit=22.5 2023-11-26 20:06:11,624 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3549880.0, ans=0.0 2023-11-26 20:06:17,940 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3549946.6666666665, ans=0.05 2023-11-26 20:06:22,049 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 532500 2023-11-26 20:06:26,267 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 3450, loss[loss=0.05859, simple_loss=0.07063, pruned_loss=0.01111, audio_tagging_loss=0.01216, over 13624.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.09014, pruned_loss=0.01222, audio_tagging_loss=0.008569, over 3045512.00 frames. ], batch size: 53, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 20:06:30,866 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.80 vs. limit=22.5 2023-11-26 20:06:38,221 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 20:06:39,324 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3550080.0, ans=0.1 2023-11-26 20:07:06,921 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3550213.3333333335, ans=0.1 2023-11-26 20:07:07,909 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3550213.3333333335, ans=0.125 2023-11-26 20:07:08,452 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.14 vs. limit=10.0 2023-11-26 20:07:17,326 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 532550 2023-11-26 20:07:18,619 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3550280.0, ans=0.125 2023-11-26 20:07:21,478 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 3500, loss[loss=0.06592, simple_loss=0.09243, pruned_loss=0.009344, audio_tagging_loss=0.01036, over 15740.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.09009, pruned_loss=0.01212, audio_tagging_loss=0.008678, over 3040872.22 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 20:07:22,737 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3550346.6666666665, ans=0.1 2023-11-26 20:07:23,648 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.644e+01 8.941e+01 9.512e+01 1.027e+02 1.407e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-26 20:07:45,052 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3550480.0, ans=0.125 2023-11-26 20:07:50,163 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 20:07:59,177 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.70 vs. limit=15.0 2023-11-26 20:08:13,654 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.90 vs. limit=15.0 2023-11-26 20:08:14,249 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 532600 2023-11-26 20:08:14,353 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3550613.3333333335, ans=0.125 2023-11-26 20:08:19,284 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 3550, loss[loss=0.06799, simple_loss=0.09271, pruned_loss=0.01433, audio_tagging_loss=0.007309, over 15219.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08999, pruned_loss=0.01209, audio_tagging_loss=0.008585, over 3037231.98 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 20:08:24,737 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3550680.0, ans=0.125 2023-11-26 20:08:35,987 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.32 vs. limit=22.5 2023-11-26 20:08:37,657 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3550746.6666666665, ans=0.1 2023-11-26 20:08:39,969 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3550813.3333333335, ans=0.025 2023-11-26 20:08:41,893 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3550813.3333333335, ans=0.125 2023-11-26 20:08:45,166 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3550813.3333333335, ans=0.1 2023-11-26 20:09:10,415 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 532650 2023-11-26 20:09:11,605 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3550946.6666666665, ans=0.125 2023-11-26 20:09:14,578 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 3600, loss[loss=0.06228, simple_loss=0.09523, pruned_loss=0.008492, audio_tagging_loss=0.006172, over 14923.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08933, pruned_loss=0.01207, audio_tagging_loss=0.008519, over 3041671.22 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:09:16,641 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.168e+01 8.819e+01 9.427e+01 1.004e+02 1.284e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-26 20:09:19,069 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3551013.3333333335, ans=0.0 2023-11-26 20:09:51,950 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3551213.3333333335, ans=0.0 2023-11-26 20:09:55,169 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 20:09:58,629 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.10 vs. limit=15.0 2023-11-26 20:10:04,758 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3551280.0, ans=0.125 2023-11-26 20:10:05,634 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 532700 2023-11-26 20:10:09,768 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 3650, loss[loss=0.07852, simple_loss=0.1118, pruned_loss=0.01395, audio_tagging_loss=0.008659, over 15123.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08992, pruned_loss=0.01216, audio_tagging_loss=0.008495, over 3042519.78 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:10:14,342 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3551346.6666666665, ans=0.125 2023-11-26 20:10:19,089 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3551346.6666666665, ans=0.125 2023-11-26 20:10:21,789 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.65 vs. limit=15.0 2023-11-26 20:10:23,370 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3551413.3333333335, ans=0.125 2023-11-26 20:10:25,979 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.29 vs. limit=6.0 2023-11-26 20:10:39,862 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3551480.0, ans=0.125 2023-11-26 20:10:49,505 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3551546.6666666665, ans=0.125 2023-11-26 20:11:02,740 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 532750 2023-11-26 20:11:06,839 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 3700, loss[loss=0.05881, simple_loss=0.08519, pruned_loss=0.007918, audio_tagging_loss=0.008298, over 14239.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.08965, pruned_loss=0.01206, audio_tagging_loss=0.008444, over 3046864.10 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:11:08,928 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.760e+01 8.774e+01 9.496e+01 1.016e+02 1.285e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-26 20:11:12,912 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3551680.0, ans=0.125 2023-11-26 20:11:28,861 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3551813.3333333335, ans=0.07 2023-11-26 20:11:39,029 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.98 vs. limit=22.5 2023-11-26 20:11:45,541 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3551880.0, ans=0.1 2023-11-26 20:11:53,123 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3551946.6666666665, ans=0.125 2023-11-26 20:11:58,957 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 532800 2023-11-26 20:12:03,433 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 3750, loss[loss=0.06613, simple_loss=0.09459, pruned_loss=0.008615, audio_tagging_loss=0.01022, over 14787.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08935, pruned_loss=0.01201, audio_tagging_loss=0.008554, over 3051051.57 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:12:06,279 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=3552013.3333333335, ans=15.0 2023-11-26 20:12:33,759 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3552146.6666666665, ans=0.125 2023-11-26 20:12:41,465 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 20:12:52,256 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3552280.0, ans=0.125 2023-11-26 20:12:54,247 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 532850 2023-11-26 20:12:58,438 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 3800, loss[loss=0.0759, simple_loss=0.1088, pruned_loss=0.01411, audio_tagging_loss=0.007388, over 15166.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.0906, pruned_loss=0.01221, audio_tagging_loss=0.008637, over 3049190.88 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:13:00,551 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.720e+01 8.984e+01 9.632e+01 1.029e+02 1.593e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-26 20:13:06,808 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.70 vs. limit=15.0 2023-11-26 20:13:10,176 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.05 vs. limit=12.0 2023-11-26 20:13:16,086 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3552413.3333333335, ans=0.125 2023-11-26 20:13:31,382 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3552546.6666666665, ans=0.0 2023-11-26 20:13:48,693 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.73 vs. limit=15.0 2023-11-26 20:13:49,771 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 532900 2023-11-26 20:13:53,012 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.64 vs. limit=22.5 2023-11-26 20:13:54,532 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 3850, loss[loss=0.06626, simple_loss=0.09964, pruned_loss=0.01023, audio_tagging_loss=0.006215, over 16018.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.09099, pruned_loss=0.01216, audio_tagging_loss=0.008588, over 3045427.53 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:14:23,868 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3552813.3333333335, ans=0.125 2023-11-26 20:14:45,347 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 532950 2023-11-26 20:14:49,658 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 3900, loss[loss=0.07555, simple_loss=0.1073, pruned_loss=0.01529, audio_tagging_loss=0.006603, over 14902.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.09043, pruned_loss=0.01218, audio_tagging_loss=0.008645, over 3040662.25 frames. ], batch size: 54, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:14:52,300 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.155e+01 8.931e+01 9.529e+01 1.011e+02 1.303e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-26 20:15:07,391 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 20:15:10,411 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3553146.6666666665, ans=0.1 2023-11-26 20:15:21,185 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3553146.6666666665, ans=0.2 2023-11-26 20:15:26,671 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3553213.3333333335, ans=0.0 2023-11-26 20:15:37,989 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.08 vs. limit=12.0 2023-11-26 20:15:38,344 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.87 vs. limit=22.5 2023-11-26 20:15:38,773 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3553280.0, ans=0.125 2023-11-26 20:15:40,864 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 533000 2023-11-26 20:15:45,314 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 3950, loss[loss=0.06693, simple_loss=0.09364, pruned_loss=0.01292, audio_tagging_loss=0.007186, over 16363.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08991, pruned_loss=0.0121, audio_tagging_loss=0.008616, over 3041545.81 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:15:46,440 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3553346.6666666665, ans=0.1 2023-11-26 20:15:48,761 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3553346.6666666665, ans=0.0 2023-11-26 20:15:58,249 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3553413.3333333335, ans=0.125 2023-11-26 20:16:02,036 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3553413.3333333335, ans=0.125 2023-11-26 20:16:29,112 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3553613.3333333335, ans=0.125 2023-11-26 20:16:36,471 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 533050 2023-11-26 20:16:39,286 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3553613.3333333335, ans=0.2 2023-11-26 20:16:42,270 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 4000, loss[loss=0.07666, simple_loss=0.1031, pruned_loss=0.01454, audio_tagging_loss=0.01057, over 14305.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.09043, pruned_loss=0.01225, audio_tagging_loss=0.008678, over 3046320.76 frames. ], batch size: 54, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:16:44,357 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.930e+01 8.850e+01 9.399e+01 1.031e+02 1.680e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-26 20:16:54,279 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3553746.6666666665, ans=0.07 2023-11-26 20:17:07,568 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3553813.3333333335, ans=0.0 2023-11-26 20:17:16,560 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.91 vs. limit=22.5 2023-11-26 20:17:33,298 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 533100 2023-11-26 20:17:37,518 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 4050, loss[loss=0.05371, simple_loss=0.06981, pruned_loss=0.008291, audio_tagging_loss=0.01052, over 15244.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.08968, pruned_loss=0.01211, audio_tagging_loss=0.008833, over 3049529.81 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:17:37,693 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3554013.3333333335, ans=0.125 2023-11-26 20:17:38,153 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.56 vs. limit=15.0 2023-11-26 20:17:39,680 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 20:17:54,674 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.84 vs. limit=15.0 2023-11-26 20:17:55,531 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3554080.0, ans=0.125 2023-11-26 20:18:03,696 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.06 vs. limit=10.0 2023-11-26 20:18:28,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3554280.0, ans=0.125 2023-11-26 20:18:29,022 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 533150 2023-11-26 20:18:33,721 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 4100, loss[loss=0.07288, simple_loss=0.0979, pruned_loss=0.0144, audio_tagging_loss=0.009524, over 14748.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08886, pruned_loss=0.01195, audio_tagging_loss=0.008911, over 3042071.82 frames. ], batch size: 54, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:18:36,817 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.406e+01 8.812e+01 9.418e+01 1.019e+02 1.290e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-26 20:18:53,476 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3554413.3333333335, ans=0.125 2023-11-26 20:19:24,909 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 533200 2023-11-26 20:19:29,892 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 4150, loss[loss=0.05189, simple_loss=0.07994, pruned_loss=0.004532, audio_tagging_loss=0.00739, over 15710.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08904, pruned_loss=0.01197, audio_tagging_loss=0.008752, over 3044152.77 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:19:36,405 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3554680.0, ans=0.125 2023-11-26 20:19:40,705 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3554746.6666666665, ans=0.125 2023-11-26 20:19:43,905 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3554746.6666666665, ans=0.125 2023-11-26 20:19:55,533 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3554813.3333333335, ans=0.125 2023-11-26 20:19:55,771 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3554813.3333333335, ans=0.025 2023-11-26 20:20:03,545 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3554880.0, ans=0.125 2023-11-26 20:20:06,760 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3554880.0, ans=0.0 2023-11-26 20:20:09,860 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 20:20:11,053 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3554880.0, ans=0.1 2023-11-26 20:20:20,959 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3554946.6666666665, ans=0.0 2023-11-26 20:20:21,946 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 533250 2023-11-26 20:20:23,794 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.07 vs. limit=8.0 2023-11-26 20:20:25,263 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3555013.3333333335, ans=0.0 2023-11-26 20:20:26,196 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 4200, loss[loss=0.05725, simple_loss=0.07169, pruned_loss=0.0113, audio_tagging_loss=0.01011, over 14612.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08909, pruned_loss=0.01212, audio_tagging_loss=0.008676, over 3033841.57 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:20:29,353 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.451e+01 8.868e+01 9.396e+01 9.993e+01 1.238e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-26 20:20:58,378 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3555146.6666666665, ans=0.125 2023-11-26 20:21:12,358 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3555280.0, ans=0.1 2023-11-26 20:21:15,499 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3555280.0, ans=0.07 2023-11-26 20:21:17,448 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 533300 2023-11-26 20:21:17,617 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3555280.0, ans=0.1 2023-11-26 20:21:21,706 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 4250, loss[loss=0.07479, simple_loss=0.09874, pruned_loss=0.01632, audio_tagging_loss=0.009093, over 15449.00 frames. ], tot_loss[loss=0.06522, simple_loss=0.08912, pruned_loss=0.01213, audio_tagging_loss=0.008529, over 3042633.40 frames. ], batch size: 61, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:21:21,852 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3555346.6666666665, ans=0.1 2023-11-26 20:21:36,785 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3555413.3333333335, ans=0.0 2023-11-26 20:21:39,680 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3555413.3333333335, ans=0.125 2023-11-26 20:21:43,323 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3555413.3333333335, ans=0.125 2023-11-26 20:21:45,523 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3555480.0, ans=0.125 2023-11-26 20:21:56,052 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3555546.6666666665, ans=0.0 2023-11-26 20:21:56,113 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3555546.6666666665, ans=0.125 2023-11-26 20:22:13,660 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 533350 2023-11-26 20:22:17,924 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 4300, loss[loss=0.07633, simple_loss=0.1122, pruned_loss=0.01483, audio_tagging_loss=0.005374, over 15695.00 frames. ], tot_loss[loss=0.066, simple_loss=0.09053, pruned_loss=0.01225, audio_tagging_loss=0.008485, over 3051019.27 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:22:21,694 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.535e+01 9.104e+01 9.879e+01 1.029e+02 1.419e+02, threshold=1.976e+02, percent-clipped=0.0 2023-11-26 20:22:23,557 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3555680.0, ans=0.125 2023-11-26 20:22:27,305 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3555680.0, ans=0.125 2023-11-26 20:22:47,660 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.66 vs. limit=10.0 2023-11-26 20:23:02,570 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.55 vs. limit=22.5 2023-11-26 20:23:10,859 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 533400 2023-11-26 20:23:15,332 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 4350, loss[loss=0.07984, simple_loss=0.1138, pruned_loss=0.01637, audio_tagging_loss=0.006553, over 15800.00 frames. ], tot_loss[loss=0.06682, simple_loss=0.09203, pruned_loss=0.01245, audio_tagging_loss=0.00835, over 3056957.54 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:23:26,056 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3556080.0, ans=0.125 2023-11-26 20:23:33,926 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.95 vs. limit=15.0 2023-11-26 20:24:06,177 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 533450 2023-11-26 20:24:08,509 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3556280.0, ans=0.125 2023-11-26 20:24:10,362 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 4400, loss[loss=0.05747, simple_loss=0.08426, pruned_loss=0.007774, audio_tagging_loss=0.007563, over 15550.00 frames. ], tot_loss[loss=0.06589, simple_loss=0.09065, pruned_loss=0.01212, audio_tagging_loss=0.008449, over 3061895.22 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:24:13,576 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.889e+01 8.838e+01 9.451e+01 1.042e+02 1.230e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-26 20:24:42,867 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3556480.0, ans=0.0 2023-11-26 20:24:46,032 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff3.min_abs, batch_count=3556546.6666666665, ans=0.2 2023-11-26 20:25:01,878 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 533500 2023-11-26 20:25:02,131 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3556613.3333333335, ans=0.125 2023-11-26 20:25:06,071 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 4450, loss[loss=0.07154, simple_loss=0.1019, pruned_loss=0.01356, audio_tagging_loss=0.007013, over 15292.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08929, pruned_loss=0.0119, audio_tagging_loss=0.008569, over 3063566.83 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:25:06,322 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3556680.0, ans=0.125 2023-11-26 20:25:41,074 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3556880.0, ans=0.2 2023-11-26 20:25:58,337 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 533550 2023-11-26 20:26:02,376 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 4500, loss[loss=0.05685, simple_loss=0.07522, pruned_loss=0.011, audio_tagging_loss=0.008239, over 14464.00 frames. ], tot_loss[loss=0.06486, simple_loss=0.08877, pruned_loss=0.01185, audio_tagging_loss=0.008624, over 3060729.76 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:26:05,646 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.109e+01 9.005e+01 9.519e+01 1.049e+02 1.463e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-26 20:26:07,327 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.24 vs. limit=22.5 2023-11-26 20:26:24,461 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.11 vs. limit=22.5 2023-11-26 20:26:27,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3557146.6666666665, ans=0.0 2023-11-26 20:26:53,422 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 533600 2023-11-26 20:26:57,879 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 4550, loss[loss=0.07347, simple_loss=0.1006, pruned_loss=0.01283, audio_tagging_loss=0.01036, over 17335.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08948, pruned_loss=0.012, audio_tagging_loss=0.008613, over 3058294.52 frames. ], batch size: 62, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:27:35,494 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3557546.6666666665, ans=0.0 2023-11-26 20:27:40,700 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 20:27:49,233 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 533650 2023-11-26 20:27:53,362 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 4600, loss[loss=0.05455, simple_loss=0.07219, pruned_loss=0.007506, audio_tagging_loss=0.01095, over 15711.00 frames. ], tot_loss[loss=0.06501, simple_loss=0.08896, pruned_loss=0.01191, audio_tagging_loss=0.008627, over 3047947.54 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:27:56,029 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.37 vs. limit=6.0 2023-11-26 20:27:56,453 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.271e+01 8.920e+01 9.626e+01 1.020e+02 1.318e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-26 20:27:57,678 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3557680.0, ans=0.125 2023-11-26 20:28:13,671 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 20:28:20,168 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3557813.3333333335, ans=0.1 2023-11-26 20:28:45,466 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 533700 2023-11-26 20:28:50,197 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 4650, loss[loss=0.06695, simple_loss=0.08308, pruned_loss=0.01737, audio_tagging_loss=0.008038, over 14148.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.089, pruned_loss=0.01207, audio_tagging_loss=0.008691, over 3040256.25 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:29:04,024 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3558080.0, ans=0.125 2023-11-26 20:29:05,062 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3558080.0, ans=0.0 2023-11-26 20:29:12,400 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3558146.6666666665, ans=0.125 2023-11-26 20:29:22,058 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 20:29:41,216 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3558280.0, ans=0.125 2023-11-26 20:29:42,056 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 533750 2023-11-26 20:29:46,212 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 4700, loss[loss=0.06909, simple_loss=0.09394, pruned_loss=0.01401, audio_tagging_loss=0.008111, over 15573.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.08945, pruned_loss=0.01207, audio_tagging_loss=0.008718, over 3038972.88 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:29:46,467 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3558346.6666666665, ans=0.125 2023-11-26 20:29:50,420 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.173e+01 8.876e+01 9.435e+01 1.008e+02 1.247e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-26 20:29:57,975 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3558413.3333333335, ans=0.1 2023-11-26 20:30:00,444 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.73 vs. limit=15.0 2023-11-26 20:30:14,439 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3558480.0, ans=0.125 2023-11-26 20:30:26,699 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3558546.6666666665, ans=0.2 2023-11-26 20:30:37,102 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 533800 2023-11-26 20:30:41,607 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 4750, loss[loss=0.06179, simple_loss=0.08881, pruned_loss=0.008959, audio_tagging_loss=0.008423, over 16497.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08887, pruned_loss=0.01194, audio_tagging_loss=0.008775, over 3037712.89 frames. ], batch size: 61, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:30:43,228 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.95 vs. limit=15.0 2023-11-26 20:30:52,010 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3558746.6666666665, ans=0.125 2023-11-26 20:31:07,080 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3558813.3333333335, ans=0.1 2023-11-26 20:31:20,950 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3558880.0, ans=0.125 2023-11-26 20:31:29,530 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3558946.6666666665, ans=0.2 2023-11-26 20:31:30,158 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.07 vs. limit=15.0 2023-11-26 20:31:33,060 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 533850 2023-11-26 20:31:37,613 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3559013.3333333335, ans=0.125 2023-11-26 20:31:38,363 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 4800, loss[loss=0.06382, simple_loss=0.09385, pruned_loss=0.00923, audio_tagging_loss=0.007663, over 14230.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.08967, pruned_loss=0.01214, audio_tagging_loss=0.00885, over 3043896.41 frames. ], batch size: 54, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:31:39,612 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3559013.3333333335, ans=0.125 2023-11-26 20:31:42,641 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.975e+01 9.037e+01 9.476e+01 1.008e+02 1.757e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-26 20:31:49,961 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3559080.0, ans=0.2 2023-11-26 20:32:10,061 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3559213.3333333335, ans=0.1 2023-11-26 20:32:29,574 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 533900 2023-11-26 20:32:34,181 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 4850, loss[loss=0.07281, simple_loss=0.09937, pruned_loss=0.0156, audio_tagging_loss=0.007528, over 15791.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.08938, pruned_loss=0.01213, audio_tagging_loss=0.008912, over 3042379.49 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:32:38,627 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3559346.6666666665, ans=0.2 2023-11-26 20:33:07,039 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3559546.6666666665, ans=0.125 2023-11-26 20:33:16,012 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3559546.6666666665, ans=0.0 2023-11-26 20:33:24,516 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3559613.3333333335, ans=0.125 2023-11-26 20:33:25,461 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 533950 2023-11-26 20:33:29,635 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 4900, loss[loss=0.06069, simple_loss=0.07699, pruned_loss=0.009464, audio_tagging_loss=0.01273, over 14277.00 frames. ], tot_loss[loss=0.0663, simple_loss=0.09033, pruned_loss=0.01232, audio_tagging_loss=0.008816, over 3040005.54 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:33:34,852 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.612e+01 8.987e+01 9.681e+01 1.025e+02 1.624e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-26 20:33:39,938 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3559746.6666666665, ans=0.05 2023-11-26 20:34:07,763 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3559880.0, ans=0.125 2023-11-26 20:34:18,742 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.41 vs. limit=15.0 2023-11-26 20:34:20,407 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 534000 2023-11-26 20:34:25,472 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 4950, loss[loss=0.05097, simple_loss=0.066, pruned_loss=0.006638, audio_tagging_loss=0.01133, over 13977.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.08947, pruned_loss=0.0123, audio_tagging_loss=0.008712, over 3041445.30 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:34:29,316 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3560013.3333333335, ans=0.0 2023-11-26 20:34:46,866 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3560146.6666666665, ans=0.025 2023-11-26 20:34:54,295 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3560146.6666666665, ans=0.05 2023-11-26 20:34:58,503 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3560213.3333333335, ans=0.2 2023-11-26 20:35:12,237 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 20:35:14,344 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=3560280.0, ans=0.025 2023-11-26 20:35:16,344 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 534050 2023-11-26 20:35:20,446 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 5000, loss[loss=0.06099, simple_loss=0.07824, pruned_loss=0.008386, audio_tagging_loss=0.01348, over 15147.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.08878, pruned_loss=0.01213, audio_tagging_loss=0.008695, over 3042887.66 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:35:25,465 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=3560346.6666666665, ans=0.95 2023-11-26 20:35:26,189 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.628e+01 9.102e+01 9.598e+01 1.044e+02 1.473e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-26 20:35:28,409 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3560346.6666666665, ans=0.1 2023-11-26 20:35:31,686 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3560413.3333333335, ans=0.125 2023-11-26 20:35:35,292 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.48 vs. limit=10.0 2023-11-26 20:35:42,288 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3560480.0, ans=0.0 2023-11-26 20:35:56,729 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3560546.6666666665, ans=0.125 2023-11-26 20:36:12,091 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 534100 2023-11-26 20:36:16,213 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 5050, loss[loss=0.04984, simple_loss=0.06516, pruned_loss=0.008553, audio_tagging_loss=0.008704, over 15000.00 frames. ], tot_loss[loss=0.0646, simple_loss=0.08798, pruned_loss=0.01193, audio_tagging_loss=0.008685, over 3039756.76 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:36:23,798 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3560680.0, ans=0.0 2023-11-26 20:36:37,296 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.97 vs. limit=15.0 2023-11-26 20:37:07,430 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 534150 2023-11-26 20:37:07,995 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.87 vs. limit=15.0 2023-11-26 20:37:12,197 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 5100, loss[loss=0.0395, simple_loss=0.04096, pruned_loss=0.004863, audio_tagging_loss=0.01416, over 14091.00 frames. ], tot_loss[loss=0.06424, simple_loss=0.08761, pruned_loss=0.01178, audio_tagging_loss=0.008659, over 3040673.92 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:37:12,519 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3561013.3333333335, ans=0.0 2023-11-26 20:37:17,811 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3561013.3333333335, ans=0.125 2023-11-26 20:37:18,597 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.475e+01 8.922e+01 9.558e+01 1.035e+02 1.358e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-26 20:37:19,853 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3561013.3333333335, ans=0.0 2023-11-26 20:37:40,784 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3561146.6666666665, ans=0.125 2023-11-26 20:37:41,901 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 20:38:04,570 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 534200 2023-11-26 20:38:09,083 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 5150, loss[loss=0.07359, simple_loss=0.09381, pruned_loss=0.01988, audio_tagging_loss=0.006814, over 14956.00 frames. ], tot_loss[loss=0.06458, simple_loss=0.08805, pruned_loss=0.01192, audio_tagging_loss=0.008637, over 3038826.15 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:38:34,310 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3561480.0, ans=0.04949747468305833 2023-11-26 20:38:44,588 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3561546.6666666665, ans=0.125 2023-11-26 20:38:55,322 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3561613.3333333335, ans=0.125 2023-11-26 20:38:58,730 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.13 vs. limit=22.5 2023-11-26 20:39:00,363 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 534250 2023-11-26 20:39:01,519 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3561613.3333333335, ans=0.2 2023-11-26 20:39:05,070 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 5200, loss[loss=0.07813, simple_loss=0.1182, pruned_loss=0.01425, audio_tagging_loss=0.004781, over 15556.00 frames. ], tot_loss[loss=0.0649, simple_loss=0.08861, pruned_loss=0.01198, audio_tagging_loss=0.008614, over 3034850.90 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:39:10,322 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.315e+01 8.817e+01 9.257e+01 9.950e+01 1.216e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-26 20:39:24,398 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3561746.6666666665, ans=0.1 2023-11-26 20:39:27,659 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.46 vs. limit=15.0 2023-11-26 20:39:43,338 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.44 vs. limit=22.5 2023-11-26 20:39:49,099 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3561946.6666666665, ans=0.5 2023-11-26 20:39:56,205 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 534300 2023-11-26 20:40:00,344 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 5250, loss[loss=0.06742, simple_loss=0.09171, pruned_loss=0.01325, audio_tagging_loss=0.008306, over 14758.00 frames. ], tot_loss[loss=0.06481, simple_loss=0.08871, pruned_loss=0.01193, audio_tagging_loss=0.008524, over 3042102.42 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:40:02,101 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3562013.3333333335, ans=0.125 2023-11-26 20:40:19,653 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.05 vs. limit=15.0 2023-11-26 20:40:53,569 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 534350 2023-11-26 20:40:57,736 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 5300, loss[loss=0.09354, simple_loss=0.1359, pruned_loss=0.01906, audio_tagging_loss=0.006548, over 15330.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.0899, pruned_loss=0.01216, audio_tagging_loss=0.008498, over 3048788.98 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:41:00,045 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3562346.6666666665, ans=0.125 2023-11-26 20:41:02,961 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.709e+01 8.749e+01 9.362e+01 1.021e+02 1.179e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-26 20:41:04,357 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3562346.6666666665, ans=0.0 2023-11-26 20:41:06,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3562346.6666666665, ans=0.1 2023-11-26 20:41:09,901 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=3562413.3333333335, ans=15.0 2023-11-26 20:41:24,046 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3562480.0, ans=0.1 2023-11-26 20:41:24,163 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 20:41:27,853 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3562480.0, ans=0.125 2023-11-26 20:41:48,815 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 534400 2023-11-26 20:41:53,341 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 5350, loss[loss=0.07346, simple_loss=0.1044, pruned_loss=0.01269, audio_tagging_loss=0.008571, over 15846.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.09042, pruned_loss=0.0121, audio_tagging_loss=0.008592, over 3041102.38 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:41:54,572 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 20:41:58,791 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3562680.0, ans=0.125 2023-11-26 20:42:26,695 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3562880.0, ans=0.125 2023-11-26 20:42:40,562 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3562946.6666666665, ans=0.125 2023-11-26 20:42:45,106 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 534450 2023-11-26 20:42:49,304 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 5400, loss[loss=0.07092, simple_loss=0.09724, pruned_loss=0.01523, audio_tagging_loss=0.007072, over 15585.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.09024, pruned_loss=0.0121, audio_tagging_loss=0.008631, over 3036903.61 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:42:52,703 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3563013.3333333335, ans=0.125 2023-11-26 20:42:56,043 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.432e+01 8.834e+01 9.520e+01 1.043e+02 1.175e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-26 20:43:04,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3563080.0, ans=0.125 2023-11-26 20:43:29,645 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3563213.3333333335, ans=0.125 2023-11-26 20:43:37,215 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.88 vs. limit=6.0 2023-11-26 20:43:41,991 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 534500 2023-11-26 20:43:45,359 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3563346.6666666665, ans=0.0 2023-11-26 20:43:46,137 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 5450, loss[loss=0.07918, simple_loss=0.1094, pruned_loss=0.01772, audio_tagging_loss=0.006749, over 16155.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.09122, pruned_loss=0.01235, audio_tagging_loss=0.008551, over 3036569.59 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:43:52,791 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3563346.6666666665, ans=0.125 2023-11-26 20:44:25,801 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3563546.6666666665, ans=0.0 2023-11-26 20:44:31,195 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3563613.3333333335, ans=0.2 2023-11-26 20:44:33,265 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3563613.3333333335, ans=0.0 2023-11-26 20:44:37,364 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 534550 2023-11-26 20:44:41,539 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 5500, loss[loss=0.062, simple_loss=0.08406, pruned_loss=0.01168, audio_tagging_loss=0.008285, over 15811.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.09065, pruned_loss=0.01228, audio_tagging_loss=0.008653, over 3035830.09 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:44:47,928 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.824e+01 9.118e+01 9.897e+01 1.074e+02 1.555e+02, threshold=1.979e+02, percent-clipped=0.0 2023-11-26 20:44:51,412 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3563746.6666666665, ans=0.0 2023-11-26 20:45:07,368 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3563813.3333333335, ans=0.1 2023-11-26 20:45:32,790 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 534600 2023-11-26 20:45:34,505 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.50 vs. limit=15.0 2023-11-26 20:45:37,296 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 5550, loss[loss=0.06623, simple_loss=0.08448, pruned_loss=0.01193, audio_tagging_loss=0.01206, over 15047.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.09025, pruned_loss=0.0122, audio_tagging_loss=0.008713, over 3038413.59 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:45:43,235 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3564013.3333333335, ans=0.125 2023-11-26 20:45:46,350 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.20 vs. limit=15.0 2023-11-26 20:45:51,187 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3564080.0, ans=0.95 2023-11-26 20:46:10,253 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3564213.3333333335, ans=0.125 2023-11-26 20:46:29,307 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 534650 2023-11-26 20:46:34,583 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 5600, loss[loss=0.07248, simple_loss=0.105, pruned_loss=0.01328, audio_tagging_loss=0.006677, over 16666.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.09014, pruned_loss=0.01223, audio_tagging_loss=0.008796, over 3038799.81 frames. ], batch size: 62, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:46:40,928 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.218e+01 8.848e+01 9.516e+01 1.047e+02 1.275e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-26 20:46:41,275 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3564346.6666666665, ans=0.125 2023-11-26 20:46:44,947 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.75 vs. limit=8.0 2023-11-26 20:46:45,273 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3564413.3333333335, ans=0.1 2023-11-26 20:46:50,883 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3564413.3333333335, ans=0.125 2023-11-26 20:47:02,413 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3564480.0, ans=0.0 2023-11-26 20:47:14,949 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 20:47:25,532 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 534700 2023-11-26 20:47:29,693 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 5650, loss[loss=0.07141, simple_loss=0.09245, pruned_loss=0.0168, audio_tagging_loss=0.00838, over 15129.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.08957, pruned_loss=0.01212, audio_tagging_loss=0.008914, over 3046207.36 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:47:29,856 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3564680.0, ans=0.125 2023-11-26 20:47:30,153 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2023-11-26 20:47:31,238 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=5.94 vs. limit=15.0 2023-11-26 20:48:04,085 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3564880.0, ans=0.2 2023-11-26 20:48:21,032 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 534750 2023-11-26 20:48:24,350 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3565013.3333333335, ans=0.125 2023-11-26 20:48:25,196 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 5700, loss[loss=0.05728, simple_loss=0.07891, pruned_loss=0.01111, audio_tagging_loss=0.006706, over 15549.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08878, pruned_loss=0.01194, audio_tagging_loss=0.008934, over 3038988.68 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:48:33,025 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.188e+01 8.707e+01 9.299e+01 1.009e+02 1.151e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-26 20:48:37,066 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3565080.0, ans=0.0 2023-11-26 20:48:37,355 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.93 vs. limit=22.5 2023-11-26 20:48:43,691 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.13 vs. limit=12.0 2023-11-26 20:48:46,951 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.69 vs. limit=10.0 2023-11-26 20:49:01,374 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=3565213.3333333335, ans=0.05 2023-11-26 20:49:15,407 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3565280.0, ans=0.2 2023-11-26 20:49:16,925 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 534800 2023-11-26 20:49:17,024 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3565280.0, ans=0.0 2023-11-26 20:49:21,916 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 5750, loss[loss=0.06646, simple_loss=0.09522, pruned_loss=0.01056, audio_tagging_loss=0.008293, over 15581.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08903, pruned_loss=0.01199, audio_tagging_loss=0.008796, over 3044920.67 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:49:38,853 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3565413.3333333335, ans=0.1 2023-11-26 20:49:45,409 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3565480.0, ans=0.0 2023-11-26 20:49:51,837 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3565480.0, ans=0.0 2023-11-26 20:49:55,115 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3565546.6666666665, ans=0.125 2023-11-26 20:50:08,526 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff2.min_abs, batch_count=3565613.3333333335, ans=0.1 2023-11-26 20:50:12,612 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 534850 2023-11-26 20:50:16,757 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 5800, loss[loss=0.06185, simple_loss=0.08571, pruned_loss=0.01108, audio_tagging_loss=0.007912, over 14598.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.08951, pruned_loss=0.01206, audio_tagging_loss=0.008635, over 3042087.65 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:50:19,495 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=3565680.0, ans=10.0 2023-11-26 20:50:24,139 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.524e+01 8.906e+01 9.529e+01 1.040e+02 1.512e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-26 20:50:27,601 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3565746.6666666665, ans=0.0 2023-11-26 20:50:36,013 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3565746.6666666665, ans=0.0 2023-11-26 20:50:39,140 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3565813.3333333335, ans=0.125 2023-11-26 20:51:07,134 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 534900 2023-11-26 20:51:09,460 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3565946.6666666665, ans=0.0 2023-11-26 20:51:11,345 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 5850, loss[loss=0.07548, simple_loss=0.09997, pruned_loss=0.01742, audio_tagging_loss=0.008078, over 15691.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.09055, pruned_loss=0.01223, audio_tagging_loss=0.008579, over 3045421.48 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:51:16,734 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3566013.3333333335, ans=0.125 2023-11-26 20:51:20,071 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3566013.3333333335, ans=0.1 2023-11-26 20:51:26,484 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3566080.0, ans=0.1 2023-11-26 20:51:28,849 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.41 vs. limit=15.0 2023-11-26 20:51:41,151 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3566146.6666666665, ans=0.125 2023-11-26 20:51:50,158 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.75 vs. limit=15.0 2023-11-26 20:51:50,767 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3566213.3333333335, ans=0.0 2023-11-26 20:52:01,264 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 534950 2023-11-26 20:52:04,027 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3566280.0, ans=0.09899494936611666 2023-11-26 20:52:05,960 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 5900, loss[loss=0.05247, simple_loss=0.07048, pruned_loss=0.009185, audio_tagging_loss=0.008047, over 12669.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.09079, pruned_loss=0.01227, audio_tagging_loss=0.008526, over 3045809.85 frames. ], batch size: 50, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:52:14,444 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.237e+01 8.767e+01 9.381e+01 1.012e+02 1.422e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-26 20:52:33,903 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3566480.0, ans=0.2 2023-11-26 20:52:40,504 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3566546.6666666665, ans=0.0 2023-11-26 20:52:57,790 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 535000 2023-11-26 20:53:02,261 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 5950, loss[loss=0.06119, simple_loss=0.08688, pruned_loss=0.008699, audio_tagging_loss=0.009048, over 15611.00 frames. ], tot_loss[loss=0.0667, simple_loss=0.09141, pruned_loss=0.01249, audio_tagging_loss=0.008507, over 3048454.76 frames. ], batch size: 62, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:53:02,484 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3566680.0, ans=0.0 2023-11-26 20:53:24,851 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3566813.3333333335, ans=0.125 2023-11-26 20:53:39,656 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.99 vs. limit=10.0 2023-11-26 20:53:53,254 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 535050 2023-11-26 20:53:57,429 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 6000, loss[loss=0.06565, simple_loss=0.09861, pruned_loss=0.009838, audio_tagging_loss=0.006508, over 14450.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.09091, pruned_loss=0.01246, audio_tagging_loss=0.008469, over 3049388.39 frames. ], batch size: 53, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:53:57,432 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-26 20:54:29,561 INFO [train_asr.py:1267] (0/4) Epoch 45, validation: loss=0.05766, simple_loss=0.05058, pruned_loss=0.005348, audio_tagging_loss=0.02702, over 4681554.00 frames. 2023-11-26 20:54:29,561 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-26 20:54:33,514 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3567013.3333333335, ans=0.1 2023-11-26 20:54:37,352 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.342e+01 8.765e+01 9.407e+01 1.018e+02 1.240e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-26 20:54:58,602 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3567146.6666666665, ans=0.125 2023-11-26 20:55:09,190 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 20:55:20,914 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 535100 2023-11-26 20:55:21,023 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3567280.0, ans=0.2 2023-11-26 20:55:25,116 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 6050, loss[loss=0.06776, simple_loss=0.09155, pruned_loss=0.01155, audio_tagging_loss=0.01044, over 15775.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.09043, pruned_loss=0.01239, audio_tagging_loss=0.008567, over 3045490.70 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:55:31,654 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3567346.6666666665, ans=0.125 2023-11-26 20:55:48,345 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.15 vs. limit=10.0 2023-11-26 20:56:16,565 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 535150 2023-11-26 20:56:20,755 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 6100, loss[loss=0.05082, simple_loss=0.0578, pruned_loss=0.007968, audio_tagging_loss=0.01395, over 15992.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.09053, pruned_loss=0.01239, audio_tagging_loss=0.008513, over 3048953.27 frames. ], batch size: 63, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:56:28,134 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.197e+01 8.965e+01 9.690e+01 1.035e+02 1.368e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-26 20:56:29,389 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3567680.0, ans=0.0 2023-11-26 20:56:34,586 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3567746.6666666665, ans=0.0 2023-11-26 20:56:42,931 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.65 vs. limit=6.0 2023-11-26 20:56:47,633 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.32 vs. limit=15.0 2023-11-26 20:56:52,557 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3567813.3333333335, ans=0.125 2023-11-26 20:57:04,029 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3567946.6666666665, ans=0.125 2023-11-26 20:57:06,356 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3567946.6666666665, ans=0.125 2023-11-26 20:57:07,444 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3567946.6666666665, ans=0.025 2023-11-26 20:57:09,530 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3567946.6666666665, ans=0.0 2023-11-26 20:57:11,504 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 535200 2023-11-26 20:57:15,577 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3568013.3333333335, ans=0.125 2023-11-26 20:57:17,029 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 6150, loss[loss=0.08759, simple_loss=0.1212, pruned_loss=0.01756, audio_tagging_loss=0.009413, over 15283.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.09066, pruned_loss=0.01231, audio_tagging_loss=0.008617, over 3047910.90 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:57:28,068 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3568080.0, ans=0.025 2023-11-26 20:57:37,353 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3568080.0, ans=0.2 2023-11-26 20:57:41,625 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3568146.6666666665, ans=0.125 2023-11-26 20:57:43,683 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 20:57:47,206 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=3568146.6666666665, ans=15.0 2023-11-26 20:57:55,406 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3568213.3333333335, ans=0.0 2023-11-26 20:58:08,727 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 535250 2023-11-26 20:58:13,469 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 6200, loss[loss=0.06616, simple_loss=0.0882, pruned_loss=0.01238, audio_tagging_loss=0.009685, over 15303.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.08948, pruned_loss=0.01205, audio_tagging_loss=0.008699, over 3052773.07 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:58:15,827 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3568346.6666666665, ans=10.0 2023-11-26 20:58:15,949 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3568346.6666666665, ans=0.125 2023-11-26 20:58:20,986 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.437e+01 8.899e+01 9.421e+01 1.012e+02 1.333e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-26 20:58:34,005 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 20:58:40,452 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3568480.0, ans=0.2 2023-11-26 20:58:48,806 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3568546.6666666665, ans=0.125 2023-11-26 20:59:03,989 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 535300 2023-11-26 20:59:06,264 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3568613.3333333335, ans=0.2 2023-11-26 20:59:08,232 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 6250, loss[loss=0.06621, simple_loss=0.09625, pruned_loss=0.008983, audio_tagging_loss=0.009104, over 15629.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.0894, pruned_loss=0.01208, audio_tagging_loss=0.008685, over 3050992.18 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:59:18,946 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3568746.6666666665, ans=0.125 2023-11-26 20:59:19,849 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3568746.6666666665, ans=0.05 2023-11-26 20:59:19,967 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3568746.6666666665, ans=0.125 2023-11-26 20:59:52,605 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.69 vs. limit=22.5 2023-11-26 20:59:58,311 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 535350 2023-11-26 21:00:02,483 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 6300, loss[loss=0.07788, simple_loss=0.1136, pruned_loss=0.01576, audio_tagging_loss=0.005317, over 16848.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.09007, pruned_loss=0.01235, audio_tagging_loss=0.008777, over 3059280.04 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:00:02,561 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3569013.3333333335, ans=0.125 2023-11-26 21:00:05,232 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3569013.3333333335, ans=0.125 2023-11-26 21:00:12,052 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.533e+01 8.840e+01 9.586e+01 1.026e+02 1.198e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-26 21:00:40,456 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3569213.3333333335, ans=0.1 2023-11-26 21:00:53,433 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3569280.0, ans=0.125 2023-11-26 21:00:54,203 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 535400 2023-11-26 21:00:58,583 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 6350, loss[loss=0.09673, simple_loss=0.1298, pruned_loss=0.02459, audio_tagging_loss=0.007256, over 15341.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.09002, pruned_loss=0.01234, audio_tagging_loss=0.008839, over 3055451.28 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:00:59,801 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3569346.6666666665, ans=0.0 2023-11-26 21:01:18,934 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.38 vs. limit=15.0 2023-11-26 21:01:35,880 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.48 vs. limit=12.0 2023-11-26 21:01:49,816 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 535450 2023-11-26 21:01:53,951 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 6400, loss[loss=0.07004, simple_loss=0.08486, pruned_loss=0.01848, audio_tagging_loss=0.009131, over 15110.00 frames. ], tot_loss[loss=0.06596, simple_loss=0.08954, pruned_loss=0.01223, audio_tagging_loss=0.008952, over 3051392.09 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:02:02,596 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.564e+01 8.580e+01 9.385e+01 1.005e+02 1.222e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-26 21:02:09,483 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.06 vs. limit=15.0 2023-11-26 21:02:23,069 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3569813.3333333335, ans=0.125 2023-11-26 21:02:38,540 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3569946.6666666665, ans=0.0 2023-11-26 21:02:43,836 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3569946.6666666665, ans=0.125 2023-11-26 21:02:44,670 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 535500 2023-11-26 21:02:48,864 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 6450, loss[loss=0.05113, simple_loss=0.05869, pruned_loss=0.009045, audio_tagging_loss=0.01274, over 14258.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.08973, pruned_loss=0.01222, audio_tagging_loss=0.009019, over 3044432.67 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:02:55,484 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3570013.3333333335, ans=0.2 2023-11-26 21:02:55,859 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.10 vs. limit=10.0 2023-11-26 21:02:56,592 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3570013.3333333335, ans=0.125 2023-11-26 21:03:02,503 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.32 vs. limit=6.0 2023-11-26 21:03:05,459 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.96 vs. limit=15.0 2023-11-26 21:03:26,069 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3570213.3333333335, ans=0.2 2023-11-26 21:03:30,240 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3570213.3333333335, ans=0.95 2023-11-26 21:03:40,696 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 535550 2023-11-26 21:03:40,785 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3570280.0, ans=0.125 2023-11-26 21:03:44,899 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 6500, loss[loss=0.06944, simple_loss=0.08713, pruned_loss=0.01224, audio_tagging_loss=0.01363, over 15083.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.08985, pruned_loss=0.01227, audio_tagging_loss=0.008912, over 3045680.75 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:03:46,209 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3570346.6666666665, ans=0.125 2023-11-26 21:03:53,423 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.873e+01 8.670e+01 9.516e+01 1.047e+02 1.238e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-26 21:03:53,622 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3570346.6666666665, ans=0.2 2023-11-26 21:04:05,841 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 21:04:15,393 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3570480.0, ans=0.125 2023-11-26 21:04:16,422 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3570546.6666666665, ans=0.0 2023-11-26 21:04:35,325 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 535600 2023-11-26 21:04:39,830 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 6550, loss[loss=0.0531, simple_loss=0.07306, pruned_loss=0.00646, audio_tagging_loss=0.01011, over 15249.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.08986, pruned_loss=0.01221, audio_tagging_loss=0.0087, over 3037771.13 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:04:40,445 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.39 vs. limit=6.0 2023-11-26 21:05:06,336 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3570813.3333333335, ans=0.125 2023-11-26 21:05:08,473 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3570813.3333333335, ans=0.1 2023-11-26 21:05:09,475 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3570813.3333333335, ans=0.125 2023-11-26 21:05:13,880 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3570880.0, ans=0.04949747468305833 2023-11-26 21:05:15,985 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3570880.0, ans=0.125 2023-11-26 21:05:27,144 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3570946.6666666665, ans=0.125 2023-11-26 21:05:27,261 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.45 vs. limit=15.0 2023-11-26 21:05:27,422 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.62 vs. limit=15.0 2023-11-26 21:05:31,088 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 535650 2023-11-26 21:05:35,373 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 6600, loss[loss=0.06626, simple_loss=0.08907, pruned_loss=0.01284, audio_tagging_loss=0.008876, over 13499.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08885, pruned_loss=0.01211, audio_tagging_loss=0.008605, over 3037910.45 frames. ], batch size: 53, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:05:44,994 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.643e+01 8.935e+01 9.455e+01 1.019e+02 1.266e+02, threshold=1.891e+02, percent-clipped=0.0 2023-11-26 21:05:46,283 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3571080.0, ans=0.0 2023-11-26 21:06:25,884 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3571280.0, ans=0.1 2023-11-26 21:06:26,880 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 535700 2023-11-26 21:06:29,217 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3571280.0, ans=0.125 2023-11-26 21:06:30,990 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 6650, loss[loss=0.05313, simple_loss=0.06564, pruned_loss=0.00846, audio_tagging_loss=0.01185, over 14942.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08926, pruned_loss=0.01221, audio_tagging_loss=0.008523, over 3038413.31 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:07:12,858 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3571546.6666666665, ans=0.125 2023-11-26 21:07:15,997 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3571613.3333333335, ans=0.125 2023-11-26 21:07:21,154 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 535750 2023-11-26 21:07:21,668 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.58 vs. limit=6.0 2023-11-26 21:07:25,286 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 6700, loss[loss=0.08286, simple_loss=0.1119, pruned_loss=0.01798, audio_tagging_loss=0.008925, over 15327.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.08842, pruned_loss=0.01215, audio_tagging_loss=0.008545, over 3034871.61 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:07:34,797 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.049e+01 8.689e+01 9.559e+01 1.023e+02 3.616e+02, threshold=1.912e+02, percent-clipped=1.0 2023-11-26 21:07:46,298 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3571813.3333333335, ans=0.1 2023-11-26 21:07:57,891 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3571880.0, ans=0.2 2023-11-26 21:08:15,808 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 535800 2023-11-26 21:08:20,259 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 6750, loss[loss=0.08146, simple_loss=0.1075, pruned_loss=0.02075, audio_tagging_loss=0.006964, over 15834.00 frames. ], tot_loss[loss=0.06438, simple_loss=0.08781, pruned_loss=0.01192, audio_tagging_loss=0.008545, over 3035082.16 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:08:26,379 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3572013.3333333335, ans=0.04949747468305833 2023-11-26 21:08:35,787 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3572080.0, ans=0.1 2023-11-26 21:08:40,058 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3572080.0, ans=0.0 2023-11-26 21:08:40,106 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 21:08:57,370 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.98 vs. limit=6.0 2023-11-26 21:09:08,282 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3572280.0, ans=0.125 2023-11-26 21:09:11,885 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 535850 2023-11-26 21:09:16,569 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 6800, loss[loss=0.07128, simple_loss=0.1038, pruned_loss=0.01105, audio_tagging_loss=0.008349, over 15058.00 frames. ], tot_loss[loss=0.06434, simple_loss=0.08769, pruned_loss=0.01188, audio_tagging_loss=0.00862, over 3036038.96 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:09:24,149 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3572346.6666666665, ans=0.125 2023-11-26 21:09:26,074 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.921e+01 8.870e+01 9.420e+01 1.023e+02 1.274e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-26 21:09:35,912 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3572413.3333333335, ans=0.125 2023-11-26 21:09:43,361 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3572480.0, ans=0.0 2023-11-26 21:09:53,976 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3572546.6666666665, ans=0.5 2023-11-26 21:10:01,960 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3572613.3333333335, ans=0.125 2023-11-26 21:10:07,129 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 535900 2023-11-26 21:10:11,383 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 6850, loss[loss=0.07572, simple_loss=0.1022, pruned_loss=0.01751, audio_tagging_loss=0.007121, over 14932.00 frames. ], tot_loss[loss=0.06428, simple_loss=0.08752, pruned_loss=0.01186, audio_tagging_loss=0.008658, over 3034552.41 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:10:13,725 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3572680.0, ans=0.2 2023-11-26 21:10:21,325 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3572746.6666666665, ans=0.07 2023-11-26 21:10:29,921 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.88 vs. limit=15.0 2023-11-26 21:10:33,461 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3572813.3333333335, ans=0.125 2023-11-26 21:10:36,267 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3572813.3333333335, ans=0.125 2023-11-26 21:11:02,627 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 535950 2023-11-26 21:11:06,887 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 6900, loss[loss=0.07718, simple_loss=0.1061, pruned_loss=0.01406, audio_tagging_loss=0.01005, over 14720.00 frames. ], tot_loss[loss=0.0645, simple_loss=0.08791, pruned_loss=0.01196, audio_tagging_loss=0.008591, over 3038585.98 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:11:18,669 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.403e+01 8.747e+01 9.465e+01 1.018e+02 1.501e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-26 21:11:19,895 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3573080.0, ans=0.0 2023-11-26 21:11:26,285 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3573080.0, ans=0.0 2023-11-26 21:11:31,420 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3573146.6666666665, ans=0.0 2023-11-26 21:11:34,950 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.49 vs. limit=15.0 2023-11-26 21:11:50,325 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 21:11:57,688 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 536000 2023-11-26 21:11:59,555 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-536000.pt 2023-11-26 21:12:04,334 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.34 vs. limit=12.0 2023-11-26 21:12:05,230 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 6950, loss[loss=0.06643, simple_loss=0.09527, pruned_loss=0.01046, audio_tagging_loss=0.008332, over 16517.00 frames. ], tot_loss[loss=0.06447, simple_loss=0.0878, pruned_loss=0.01202, audio_tagging_loss=0.008552, over 3044067.50 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:12:05,406 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3573346.6666666665, ans=0.125 2023-11-26 21:12:10,150 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3573346.6666666665, ans=0.05 2023-11-26 21:12:18,720 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3573413.3333333335, ans=0.2 2023-11-26 21:12:18,762 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3573413.3333333335, ans=0.1 2023-11-26 21:12:19,814 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3573413.3333333335, ans=0.125 2023-11-26 21:12:22,289 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.68 vs. limit=15.0 2023-11-26 21:12:25,107 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3573413.3333333335, ans=0.1 2023-11-26 21:12:49,535 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3573613.3333333335, ans=0.125 2023-11-26 21:12:56,845 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 536050 2023-11-26 21:13:00,999 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 7000, loss[loss=0.07978, simple_loss=0.1145, pruned_loss=0.01621, audio_tagging_loss=0.006308, over 14770.00 frames. ], tot_loss[loss=0.06428, simple_loss=0.08721, pruned_loss=0.01199, audio_tagging_loss=0.008688, over 3037567.98 frames. ], batch size: 54, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 21:13:05,446 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3573680.0, ans=0.0 2023-11-26 21:13:10,722 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3573746.6666666665, ans=0.035 2023-11-26 21:13:12,666 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.349e+01 8.901e+01 9.470e+01 1.019e+02 1.225e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-26 21:13:13,255 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.68 vs. limit=15.0 2023-11-26 21:13:22,462 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3573813.3333333335, ans=0.125 2023-11-26 21:13:28,821 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3573813.3333333335, ans=0.125 2023-11-26 21:13:30,414 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 21:13:41,670 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.08 vs. limit=15.0 2023-11-26 21:13:47,667 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3573946.6666666665, ans=0.1 2023-11-26 21:13:51,871 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 536100 2023-11-26 21:13:56,032 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 7050, loss[loss=0.06963, simple_loss=0.09166, pruned_loss=0.01392, audio_tagging_loss=0.009881, over 16027.00 frames. ], tot_loss[loss=0.06522, simple_loss=0.08882, pruned_loss=0.0121, audio_tagging_loss=0.008709, over 3038373.65 frames. ], batch size: 61, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 21:13:58,391 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3574013.3333333335, ans=0.0 2023-11-26 21:14:00,364 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3574013.3333333335, ans=0.125 2023-11-26 21:14:07,907 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3574080.0, ans=0.1 2023-11-26 21:14:12,088 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3574080.0, ans=0.125 2023-11-26 21:14:21,142 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3574146.6666666665, ans=0.125 2023-11-26 21:14:24,323 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3574146.6666666665, ans=0.125 2023-11-26 21:14:42,609 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.57 vs. limit=6.0 2023-11-26 21:14:45,570 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3574280.0, ans=0.125 2023-11-26 21:14:45,914 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.91 vs. limit=22.5 2023-11-26 21:14:46,442 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 536150 2023-11-26 21:14:51,230 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 7100, loss[loss=0.07059, simple_loss=0.09542, pruned_loss=0.01219, audio_tagging_loss=0.01069, over 16759.00 frames. ], tot_loss[loss=0.06511, simple_loss=0.08866, pruned_loss=0.01204, audio_tagging_loss=0.008741, over 3040502.31 frames. ], batch size: 62, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 21:14:56,908 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3574346.6666666665, ans=0.0 2023-11-26 21:15:00,007 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3574346.6666666665, ans=0.125 2023-11-26 21:15:04,206 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.303e+01 8.863e+01 9.458e+01 1.036e+02 1.512e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-26 21:15:06,534 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3574413.3333333335, ans=0.125 2023-11-26 21:15:11,967 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3574413.3333333335, ans=0.125 2023-11-26 21:15:12,918 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3574480.0, ans=0.0 2023-11-26 21:15:26,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3574546.6666666665, ans=0.0 2023-11-26 21:15:43,239 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 536200 2023-11-26 21:15:47,678 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 7150, loss[loss=0.0684, simple_loss=0.09927, pruned_loss=0.01157, audio_tagging_loss=0.007195, over 16031.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08904, pruned_loss=0.0121, audio_tagging_loss=0.008764, over 3039336.29 frames. ], batch size: 62, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 21:16:03,836 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.03 vs. limit=15.0 2023-11-26 21:16:03,857 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.94 vs. limit=15.0 2023-11-26 21:16:34,164 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.51 vs. limit=12.0 2023-11-26 21:16:37,852 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 536250 2023-11-26 21:16:42,033 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 7200, loss[loss=0.08949, simple_loss=0.1136, pruned_loss=0.02112, audio_tagging_loss=0.01159, over 14245.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08891, pruned_loss=0.01209, audio_tagging_loss=0.00885, over 3043513.43 frames. ], batch size: 53, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:16:45,508 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3575013.3333333335, ans=0.125 2023-11-26 21:16:53,653 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.282e+01 8.982e+01 9.532e+01 1.041e+02 1.325e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-26 21:16:55,934 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3575080.0, ans=0.0 2023-11-26 21:17:32,470 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 536300 2023-11-26 21:17:36,679 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 7250, loss[loss=0.04585, simple_loss=0.0595, pruned_loss=0.006999, audio_tagging_loss=0.009104, over 14495.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.08898, pruned_loss=0.01208, audio_tagging_loss=0.008877, over 3037977.16 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:17:56,666 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3575413.3333333335, ans=0.125 2023-11-26 21:18:04,253 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3575480.0, ans=0.1 2023-11-26 21:18:28,377 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 536350 2023-11-26 21:18:33,115 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 7300, loss[loss=0.05956, simple_loss=0.07752, pruned_loss=0.01155, audio_tagging_loss=0.009257, over 13883.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.089, pruned_loss=0.01203, audio_tagging_loss=0.008819, over 3032933.46 frames. ], batch size: 53, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 21:18:45,941 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.580e+01 8.748e+01 9.464e+01 1.022e+02 1.262e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-26 21:18:53,638 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3575813.3333333335, ans=0.07 2023-11-26 21:19:10,437 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3575880.0, ans=0.0 2023-11-26 21:19:23,598 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 536400 2023-11-26 21:19:28,018 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 7350, loss[loss=0.07217, simple_loss=0.1012, pruned_loss=0.01426, audio_tagging_loss=0.007292, over 15648.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.08899, pruned_loss=0.01205, audio_tagging_loss=0.008666, over 3038267.76 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 21:19:30,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3576013.3333333335, ans=0.1 2023-11-26 21:19:31,305 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3576013.3333333335, ans=0.0 2023-11-26 21:19:42,792 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3576080.0, ans=0.125 2023-11-26 21:19:49,156 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3576146.6666666665, ans=0.125 2023-11-26 21:20:16,509 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3576280.0, ans=0.125 2023-11-26 21:20:18,443 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 536450 2023-11-26 21:20:18,977 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.62 vs. limit=15.0 2023-11-26 21:20:22,648 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 7400, loss[loss=0.07334, simple_loss=0.09205, pruned_loss=0.01799, audio_tagging_loss=0.009327, over 14919.00 frames. ], tot_loss[loss=0.06509, simple_loss=0.08913, pruned_loss=0.012, audio_tagging_loss=0.008522, over 3034959.65 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 21:20:31,526 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3576346.6666666665, ans=0.2 2023-11-26 21:20:36,525 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.430e+01 8.979e+01 9.560e+01 1.029e+02 2.303e+02, threshold=1.912e+02, percent-clipped=1.0 2023-11-26 21:21:04,966 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3576546.6666666665, ans=0.0 2023-11-26 21:21:08,213 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3576613.3333333335, ans=0.125 2023-11-26 21:21:13,776 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3576613.3333333335, ans=0.0 2023-11-26 21:21:14,564 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 536500 2023-11-26 21:21:15,923 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3576613.3333333335, ans=0.125 2023-11-26 21:21:18,747 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 7450, loss[loss=0.06255, simple_loss=0.08954, pruned_loss=0.01013, audio_tagging_loss=0.007652, over 15673.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08965, pruned_loss=0.01215, audio_tagging_loss=0.008475, over 3045013.28 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 21:21:50,332 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 21:21:57,857 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3576880.0, ans=0.1 2023-11-26 21:22:05,461 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.86 vs. limit=15.0 2023-11-26 21:22:08,819 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3576946.6666666665, ans=0.125 2023-11-26 21:22:09,793 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 536550 2023-11-26 21:22:13,906 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 7500, loss[loss=0.07405, simple_loss=0.1047, pruned_loss=0.01461, audio_tagging_loss=0.007111, over 16415.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08962, pruned_loss=0.01226, audio_tagging_loss=0.008401, over 3041301.81 frames. ], batch size: 61, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 21:22:26,634 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.500e+01 8.830e+01 9.434e+01 1.016e+02 1.615e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-26 21:23:04,451 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 536600 2023-11-26 21:23:08,918 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 7550, loss[loss=0.07035, simple_loss=0.09757, pruned_loss=0.01459, audio_tagging_loss=0.006976, over 14863.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08956, pruned_loss=0.01239, audio_tagging_loss=0.008518, over 3038385.74 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 21:23:13,374 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3577346.6666666665, ans=0.125 2023-11-26 21:23:16,535 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.63 vs. limit=15.0 2023-11-26 21:23:32,660 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3577480.0, ans=0.1 2023-11-26 21:23:33,582 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3577480.0, ans=0.125 2023-11-26 21:23:43,794 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3577546.6666666665, ans=0.5 2023-11-26 21:24:00,015 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 536650 2023-11-26 21:24:04,760 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 7600, loss[loss=0.06493, simple_loss=0.08899, pruned_loss=0.01243, audio_tagging_loss=0.008, over 15925.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.0893, pruned_loss=0.01223, audio_tagging_loss=0.008556, over 3045330.64 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:24:16,712 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3577746.6666666665, ans=0.125 2023-11-26 21:24:17,500 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.414e+01 8.794e+01 9.367e+01 9.817e+01 1.272e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-26 21:24:18,344 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3577746.6666666665, ans=0.125 2023-11-26 21:24:56,181 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 536700 2023-11-26 21:24:56,261 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3577946.6666666665, ans=0.0 2023-11-26 21:25:00,345 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 7650, loss[loss=0.05889, simple_loss=0.08141, pruned_loss=0.01141, audio_tagging_loss=0.006777, over 15340.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08908, pruned_loss=0.01214, audio_tagging_loss=0.008617, over 3035488.29 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:25:22,782 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3578146.6666666665, ans=0.1 2023-11-26 21:25:26,613 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3578146.6666666665, ans=0.07 2023-11-26 21:25:29,334 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3578146.6666666665, ans=0.5 2023-11-26 21:25:30,534 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.20 vs. limit=15.0 2023-11-26 21:25:36,862 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.whiten.whitening_limit, batch_count=3578213.3333333335, ans=12.0 2023-11-26 21:25:44,215 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3578280.0, ans=0.0 2023-11-26 21:25:51,181 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3578280.0, ans=0.5 2023-11-26 21:25:52,128 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 536750 2023-11-26 21:25:56,393 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 7700, loss[loss=0.06299, simple_loss=0.08156, pruned_loss=0.01133, audio_tagging_loss=0.01088, over 15452.00 frames. ], tot_loss[loss=0.06505, simple_loss=0.08872, pruned_loss=0.01202, audio_tagging_loss=0.008665, over 3034757.71 frames. ], batch size: 61, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:26:10,140 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.691e+01 8.777e+01 9.451e+01 1.024e+02 1.236e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-26 21:26:10,402 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3578413.3333333335, ans=0.05 2023-11-26 21:26:25,794 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3578480.0, ans=0.2 2023-11-26 21:26:36,749 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3578546.6666666665, ans=0.0 2023-11-26 21:26:39,966 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=3578613.3333333335, ans=10.0 2023-11-26 21:26:41,151 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 21:26:47,864 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 536800 2023-11-26 21:26:52,884 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 7750, loss[loss=0.08576, simple_loss=0.1222, pruned_loss=0.01624, audio_tagging_loss=0.008426, over 16057.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08912, pruned_loss=0.01207, audio_tagging_loss=0.008667, over 3033792.00 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:27:01,125 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3578680.0, ans=0.1 2023-11-26 21:27:06,755 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.49 vs. limit=22.5 2023-11-26 21:27:07,781 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.48 vs. limit=6.0 2023-11-26 21:27:44,478 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 536850 2023-11-26 21:27:46,675 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3578946.6666666665, ans=0.125 2023-11-26 21:27:48,647 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 7800, loss[loss=0.07151, simple_loss=0.09833, pruned_loss=0.01247, audio_tagging_loss=0.009877, over 15395.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.08881, pruned_loss=0.01202, audio_tagging_loss=0.008653, over 3033540.24 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:27:48,929 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3579013.3333333335, ans=0.0 2023-11-26 21:27:57,245 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3579013.3333333335, ans=0.125 2023-11-26 21:28:01,828 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.237e+01 9.125e+01 9.673e+01 1.032e+02 1.227e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-26 21:28:06,636 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.03 vs. limit=15.0 2023-11-26 21:28:12,225 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3579146.6666666665, ans=0.1 2023-11-26 21:28:24,910 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3579213.3333333335, ans=0.125 2023-11-26 21:28:31,724 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.57 vs. limit=12.0 2023-11-26 21:28:34,430 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3579280.0, ans=0.0 2023-11-26 21:28:39,490 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 536900 2023-11-26 21:28:44,305 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 7850, loss[loss=0.0643, simple_loss=0.07968, pruned_loss=0.01598, audio_tagging_loss=0.00848, over 14921.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.08892, pruned_loss=0.01207, audio_tagging_loss=0.008597, over 3039862.28 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:28:56,049 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.41 vs. limit=6.0 2023-11-26 21:29:01,776 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.42 vs. limit=15.0 2023-11-26 21:29:15,022 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.05 vs. limit=15.0 2023-11-26 21:29:33,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3579613.3333333335, ans=0.125 2023-11-26 21:29:35,296 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 536950 2023-11-26 21:29:39,960 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 7900, loss[loss=0.06063, simple_loss=0.08042, pruned_loss=0.01042, audio_tagging_loss=0.01, over 15754.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.08879, pruned_loss=0.0119, audio_tagging_loss=0.008673, over 3048120.66 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:29:43,308 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.64 vs. limit=22.5 2023-11-26 21:29:50,885 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3579746.6666666665, ans=0.125 2023-11-26 21:29:53,828 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.143e+01 8.961e+01 9.633e+01 1.012e+02 1.259e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-26 21:30:13,911 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3579880.0, ans=0.2 2023-11-26 21:30:15,877 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3579880.0, ans=0.125 2023-11-26 21:30:32,174 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 537000 2023-11-26 21:30:36,636 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 7950, loss[loss=0.04952, simple_loss=0.05697, pruned_loss=0.006413, audio_tagging_loss=0.01462, over 14941.00 frames. ], tot_loss[loss=0.065, simple_loss=0.08867, pruned_loss=0.0119, audio_tagging_loss=0.008762, over 3043783.79 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:30:43,280 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3580013.3333333335, ans=0.2 2023-11-26 21:30:43,802 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.66 vs. limit=15.0 2023-11-26 21:30:50,453 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 21:30:54,349 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3580080.0, ans=0.0 2023-11-26 21:30:55,269 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3580080.0, ans=0.125 2023-11-26 21:31:00,915 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3580146.6666666665, ans=0.125 2023-11-26 21:31:09,333 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3580213.3333333335, ans=0.125 2023-11-26 21:31:09,338 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3580213.3333333335, ans=0.125 2023-11-26 21:31:13,060 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3580213.3333333335, ans=0.125 2023-11-26 21:31:27,711 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 537050 2023-11-26 21:31:31,828 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 8000, loss[loss=0.07586, simple_loss=0.1154, pruned_loss=0.0123, audio_tagging_loss=0.005848, over 14877.00 frames. ], tot_loss[loss=0.06464, simple_loss=0.08815, pruned_loss=0.01165, audio_tagging_loss=0.00891, over 3038887.29 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:31:39,915 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3580346.6666666665, ans=0.0 2023-11-26 21:31:45,566 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.179e+01 8.727e+01 9.223e+01 9.988e+01 1.687e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-26 21:32:08,159 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.35 vs. limit=15.0 2023-11-26 21:32:11,952 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3580546.6666666665, ans=0.125 2023-11-26 21:32:16,633 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.02 vs. limit=15.0 2023-11-26 21:32:22,930 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 537100 2023-11-26 21:32:23,091 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3580613.3333333335, ans=0.1 2023-11-26 21:32:24,200 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3580613.3333333335, ans=0.125 2023-11-26 21:32:27,636 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 8050, loss[loss=0.05738, simple_loss=0.08574, pruned_loss=0.007591, audio_tagging_loss=0.006918, over 14245.00 frames. ], tot_loss[loss=0.06453, simple_loss=0.08804, pruned_loss=0.01159, audio_tagging_loss=0.008921, over 3035291.57 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:33:19,945 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 537150 2023-11-26 21:33:24,162 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 8100, loss[loss=0.07955, simple_loss=0.1132, pruned_loss=0.01617, audio_tagging_loss=0.006791, over 14691.00 frames. ], tot_loss[loss=0.06495, simple_loss=0.08839, pruned_loss=0.01185, audio_tagging_loss=0.008903, over 3038340.02 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:33:32,894 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3581013.3333333335, ans=0.07 2023-11-26 21:33:36,870 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.566e+01 8.942e+01 9.751e+01 1.046e+02 1.316e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-26 21:34:04,767 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3581213.3333333335, ans=0.1 2023-11-26 21:34:15,183 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 537200 2023-11-26 21:34:19,653 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 8150, loss[loss=0.07108, simple_loss=0.1054, pruned_loss=0.01089, audio_tagging_loss=0.007471, over 15995.00 frames. ], tot_loss[loss=0.06492, simple_loss=0.08847, pruned_loss=0.01187, audio_tagging_loss=0.008817, over 3045501.62 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:34:29,933 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.77 vs. limit=15.0 2023-11-26 21:35:00,333 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3581546.6666666665, ans=0.125 2023-11-26 21:35:00,469 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3581546.6666666665, ans=0.07 2023-11-26 21:35:10,734 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 537250 2023-11-26 21:35:15,016 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 8200, loss[loss=0.05586, simple_loss=0.07925, pruned_loss=0.008881, audio_tagging_loss=0.007353, over 14358.00 frames. ], tot_loss[loss=0.06494, simple_loss=0.08871, pruned_loss=0.01194, audio_tagging_loss=0.00864, over 3042387.59 frames. ], batch size: 54, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:35:17,761 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 21:35:25,872 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3581680.0, ans=0.125 2023-11-26 21:35:29,848 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.292e+01 8.811e+01 9.586e+01 1.032e+02 1.518e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-26 21:35:35,729 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.52 vs. limit=22.5 2023-11-26 21:35:36,570 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3581746.6666666665, ans=0.1 2023-11-26 21:35:37,518 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3581813.3333333335, ans=0.07 2023-11-26 21:35:56,615 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3581880.0, ans=10.0 2023-11-26 21:36:08,183 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 537300 2023-11-26 21:36:08,335 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3581946.6666666665, ans=0.1 2023-11-26 21:36:12,434 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 8250, loss[loss=0.06316, simple_loss=0.08384, pruned_loss=0.009571, audio_tagging_loss=0.01167, over 15131.00 frames. ], tot_loss[loss=0.06496, simple_loss=0.08872, pruned_loss=0.01196, audio_tagging_loss=0.008645, over 3038820.04 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:36:19,112 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3582013.3333333335, ans=0.0 2023-11-26 21:36:35,902 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3582146.6666666665, ans=0.0 2023-11-26 21:36:43,953 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3582213.3333333335, ans=0.125 2023-11-26 21:36:48,218 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3582213.3333333335, ans=0.2 2023-11-26 21:37:03,397 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 537350 2023-11-26 21:37:07,513 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 8300, loss[loss=0.05756, simple_loss=0.07458, pruned_loss=0.01224, audio_tagging_loss=0.008028, over 15480.00 frames. ], tot_loss[loss=0.06511, simple_loss=0.08901, pruned_loss=0.01206, audio_tagging_loss=0.00854, over 3043848.02 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:37:13,344 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.60 vs. limit=15.0 2023-11-26 21:37:20,178 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.577e+01 8.995e+01 9.587e+01 1.028e+02 1.257e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-26 21:37:27,277 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3582413.3333333335, ans=0.1 2023-11-26 21:37:57,490 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3582613.3333333335, ans=0.0 2023-11-26 21:37:58,393 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 537400 2023-11-26 21:38:02,872 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 8350, loss[loss=0.06982, simple_loss=0.09091, pruned_loss=0.01197, audio_tagging_loss=0.0124, over 16327.00 frames. ], tot_loss[loss=0.06505, simple_loss=0.08906, pruned_loss=0.01198, audio_tagging_loss=0.008539, over 3048368.29 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:38:17,150 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.47 vs. limit=22.5 2023-11-26 21:38:24,482 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.85 vs. limit=15.0 2023-11-26 21:38:28,078 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.42 vs. limit=6.0 2023-11-26 21:38:30,623 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3582813.3333333335, ans=0.1 2023-11-26 21:38:54,698 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 537450 2023-11-26 21:38:58,680 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3583013.3333333335, ans=0.125 2023-11-26 21:38:59,513 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 8400, loss[loss=0.06567, simple_loss=0.09696, pruned_loss=0.009444, audio_tagging_loss=0.007745, over 16497.00 frames. ], tot_loss[loss=0.0651, simple_loss=0.08918, pruned_loss=0.01197, audio_tagging_loss=0.008544, over 3044858.16 frames. ], batch size: 62, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:39:13,341 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.643e+01 8.925e+01 9.429e+01 9.938e+01 1.352e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-26 21:39:28,282 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3583146.6666666665, ans=0.0 2023-11-26 21:39:28,524 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.41 vs. limit=12.0 2023-11-26 21:39:50,069 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 537500 2023-11-26 21:39:54,184 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 8450, loss[loss=0.07378, simple_loss=0.1048, pruned_loss=0.01294, audio_tagging_loss=0.008424, over 16052.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.09018, pruned_loss=0.01209, audio_tagging_loss=0.008528, over 3052949.41 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:40:06,080 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3583413.3333333335, ans=0.0 2023-11-26 21:40:33,505 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3583546.6666666665, ans=0.0 2023-11-26 21:40:44,873 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 537550 2023-11-26 21:40:49,053 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 8500, loss[loss=0.07653, simple_loss=0.1037, pruned_loss=0.01605, audio_tagging_loss=0.008623, over 16194.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.08979, pruned_loss=0.01198, audio_tagging_loss=0.008591, over 3050955.78 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:40:51,434 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3583680.0, ans=0.125 2023-11-26 21:40:52,406 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3583680.0, ans=0.125 2023-11-26 21:40:53,629 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3583680.0, ans=0.2 2023-11-26 21:40:58,658 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.66 vs. limit=15.0 2023-11-26 21:41:04,449 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.148e+01 8.764e+01 9.533e+01 1.022e+02 1.336e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-26 21:41:28,530 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3583880.0, ans=0.0 2023-11-26 21:41:40,584 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 537600 2023-11-26 21:41:45,587 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 8550, loss[loss=0.06393, simple_loss=0.08748, pruned_loss=0.01244, audio_tagging_loss=0.007754, over 15629.00 frames. ], tot_loss[loss=0.06496, simple_loss=0.08904, pruned_loss=0.01182, audio_tagging_loss=0.008618, over 3043118.76 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:41:58,424 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3584080.0, ans=0.125 2023-11-26 21:42:04,850 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.48 vs. limit=15.0 2023-11-26 21:42:07,703 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3584146.6666666665, ans=0.1 2023-11-26 21:42:08,854 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3584146.6666666665, ans=0.125 2023-11-26 21:42:17,357 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3584213.3333333335, ans=0.05 2023-11-26 21:42:24,418 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.14 vs. limit=6.0 2023-11-26 21:42:30,601 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3584280.0, ans=0.0 2023-11-26 21:42:37,274 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 537650 2023-11-26 21:42:41,458 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 8600, loss[loss=0.07654, simple_loss=0.1109, pruned_loss=0.01236, audio_tagging_loss=0.008743, over 16162.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.08881, pruned_loss=0.01184, audio_tagging_loss=0.008721, over 3040797.47 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:42:49,010 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3584346.6666666665, ans=0.125 2023-11-26 21:42:49,430 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.87 vs. limit=15.0 2023-11-26 21:42:52,343 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3584413.3333333335, ans=10.0 2023-11-26 21:42:54,499 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3584413.3333333335, ans=0.125 2023-11-26 21:42:55,226 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.372e+01 8.837e+01 9.386e+01 1.001e+02 1.418e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-26 21:42:56,430 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3584413.3333333335, ans=0.0 2023-11-26 21:43:12,199 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3584480.0, ans=0.125 2023-11-26 21:43:14,305 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3584546.6666666665, ans=0.125 2023-11-26 21:43:28,507 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3584613.3333333335, ans=0.0 2023-11-26 21:43:32,604 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 537700 2023-11-26 21:43:36,829 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 8650, loss[loss=0.07981, simple_loss=0.1058, pruned_loss=0.0148, audio_tagging_loss=0.01212, over 15561.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.08991, pruned_loss=0.01195, audio_tagging_loss=0.008803, over 3040013.33 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:43:38,078 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3584680.0, ans=0.125 2023-11-26 21:43:48,308 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3584746.6666666665, ans=0.1 2023-11-26 21:44:18,615 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3584880.0, ans=0.1 2023-11-26 21:44:28,159 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 537750 2023-11-26 21:44:33,340 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 8700, loss[loss=0.05731, simple_loss=0.07285, pruned_loss=0.01292, audio_tagging_loss=0.007969, over 15237.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.09078, pruned_loss=0.01196, audio_tagging_loss=0.008762, over 3036116.28 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:44:49,091 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.824e+01 9.138e+01 9.810e+01 1.049e+02 1.289e+02, threshold=1.962e+02, percent-clipped=0.0 2023-11-26 21:45:04,374 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3585146.6666666665, ans=0.125 2023-11-26 21:45:19,648 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3585280.0, ans=0.0 2023-11-26 21:45:23,131 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.86 vs. limit=15.0 2023-11-26 21:45:24,809 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 537800 2023-11-26 21:45:24,995 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=3585280.0, ans=10.0 2023-11-26 21:45:29,262 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 8750, loss[loss=0.05114, simple_loss=0.06769, pruned_loss=0.00824, audio_tagging_loss=0.009057, over 14528.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.09016, pruned_loss=0.01201, audio_tagging_loss=0.00882, over 3035019.36 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:45:34,330 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3585346.6666666665, ans=0.125 2023-11-26 21:45:39,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3585413.3333333335, ans=0.125 2023-11-26 21:45:43,818 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3585413.3333333335, ans=0.2 2023-11-26 21:45:58,609 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3585480.0, ans=0.1 2023-11-26 21:46:00,722 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3585480.0, ans=0.0 2023-11-26 21:46:18,610 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3585613.3333333335, ans=0.1 2023-11-26 21:46:20,579 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 537850 2023-11-26 21:46:24,662 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 8800, loss[loss=0.06336, simple_loss=0.09035, pruned_loss=0.01005, audio_tagging_loss=0.008135, over 15450.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.09038, pruned_loss=0.01231, audio_tagging_loss=0.008881, over 3031621.47 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:46:32,342 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3585680.0, ans=0.1 2023-11-26 21:46:35,546 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3585746.6666666665, ans=0.0 2023-11-26 21:46:36,638 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3585746.6666666665, ans=0.1 2023-11-26 21:46:40,771 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.730e+01 8.993e+01 9.548e+01 1.016e+02 1.284e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-26 21:46:47,715 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3585813.3333333335, ans=0.125 2023-11-26 21:47:15,778 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 537900 2023-11-26 21:47:20,528 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 8850, loss[loss=0.07645, simple_loss=0.1052, pruned_loss=0.01563, audio_tagging_loss=0.008204, over 15912.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.08981, pruned_loss=0.01214, audio_tagging_loss=0.008986, over 3032147.58 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:47:21,036 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.75 vs. limit=22.5 2023-11-26 21:47:22,401 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3586013.3333333335, ans=0.125 2023-11-26 21:47:33,207 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 21:47:34,531 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3586080.0, ans=0.95 2023-11-26 21:47:55,698 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.06 vs. limit=15.0 2023-11-26 21:47:56,494 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3586213.3333333335, ans=0.125 2023-11-26 21:48:12,683 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 537950 2023-11-26 21:48:15,997 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3586346.6666666665, ans=0.125 2023-11-26 21:48:16,836 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 8900, loss[loss=0.0629, simple_loss=0.08911, pruned_loss=0.009794, audio_tagging_loss=0.008552, over 16495.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.09097, pruned_loss=0.01217, audio_tagging_loss=0.00875, over 3040458.50 frames. ], batch size: 62, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:48:19,223 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3586346.6666666665, ans=0.2 2023-11-26 21:48:28,554 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.80 vs. limit=6.0 2023-11-26 21:48:33,338 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.240e+01 9.001e+01 9.576e+01 1.032e+02 1.288e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-26 21:48:36,785 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3586413.3333333335, ans=0.1 2023-11-26 21:48:47,353 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3586480.0, ans=0.125 2023-11-26 21:48:48,490 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3586480.0, ans=0.125 2023-11-26 21:48:58,625 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 21:48:59,622 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3586546.6666666665, ans=0.125 2023-11-26 21:49:00,635 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3586613.3333333335, ans=0.125 2023-11-26 21:49:06,853 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3586613.3333333335, ans=0.0 2023-11-26 21:49:07,880 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 538000 2023-11-26 21:49:12,799 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 8950, loss[loss=0.06403, simple_loss=0.08793, pruned_loss=0.01154, audio_tagging_loss=0.008525, over 15051.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.08997, pruned_loss=0.01198, audio_tagging_loss=0.008747, over 3046352.10 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:49:14,330 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.56 vs. limit=12.0 2023-11-26 21:49:32,045 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 21:49:34,880 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3586813.3333333335, ans=0.125 2023-11-26 21:49:35,934 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3586813.3333333335, ans=0.125 2023-11-26 21:50:03,817 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 538050 2023-11-26 21:50:03,906 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3586946.6666666665, ans=0.1 2023-11-26 21:50:08,032 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 9000, loss[loss=0.08381, simple_loss=0.1066, pruned_loss=0.02091, audio_tagging_loss=0.009592, over 16655.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.08957, pruned_loss=0.01198, audio_tagging_loss=0.008716, over 3050752.18 frames. ], batch size: 63, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:50:08,034 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-26 21:50:40,490 INFO [train_asr.py:1267] (0/4) Epoch 45, validation: loss=0.05836, simple_loss=0.0505, pruned_loss=0.005274, audio_tagging_loss=0.02784, over 4681554.00 frames. 2023-11-26 21:50:40,491 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-26 21:50:44,946 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3587013.3333333335, ans=0.0 2023-11-26 21:50:45,992 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3587013.3333333335, ans=0.0 2023-11-26 21:50:56,766 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.438e+01 8.868e+01 9.363e+01 9.972e+01 1.329e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-26 21:50:58,370 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.16 vs. limit=10.0 2023-11-26 21:51:02,395 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3587146.6666666665, ans=0.1 2023-11-26 21:51:07,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3587146.6666666665, ans=0.0 2023-11-26 21:51:31,230 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 538100 2023-11-26 21:51:35,378 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 9050, loss[loss=0.06514, simple_loss=0.09445, pruned_loss=0.009539, audio_tagging_loss=0.008373, over 16729.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.09011, pruned_loss=0.01205, audio_tagging_loss=0.008587, over 3059446.35 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:52:02,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3587480.0, ans=0.0 2023-11-26 21:52:26,839 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 538150 2023-11-26 21:52:26,922 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3587613.3333333335, ans=0.1 2023-11-26 21:52:31,944 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 9100, loss[loss=0.06987, simple_loss=0.08797, pruned_loss=0.01839, audio_tagging_loss=0.007494, over 15361.00 frames. ], tot_loss[loss=0.06505, simple_loss=0.089, pruned_loss=0.01196, audio_tagging_loss=0.008592, over 3057251.97 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:52:33,184 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3587680.0, ans=0.125 2023-11-26 21:52:40,290 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3587680.0, ans=0.125 2023-11-26 21:52:49,594 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.208e+01 8.688e+01 9.524e+01 1.031e+02 1.397e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-26 21:52:52,190 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.60 vs. limit=15.0 2023-11-26 21:53:04,825 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.58 vs. limit=12.0 2023-11-26 21:53:21,411 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3587946.6666666665, ans=0.1 2023-11-26 21:53:23,836 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 538200 2023-11-26 21:53:28,232 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 9150, loss[loss=0.0638, simple_loss=0.09046, pruned_loss=0.0118, audio_tagging_loss=0.006774, over 15579.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.09005, pruned_loss=0.01213, audio_tagging_loss=0.008486, over 3055857.31 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 21:53:32,929 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.74 vs. limit=15.0 2023-11-26 21:53:39,963 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3588080.0, ans=0.2 2023-11-26 21:53:44,349 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3588080.0, ans=0.07 2023-11-26 21:53:45,720 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.84 vs. limit=12.0 2023-11-26 21:53:52,291 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3588146.6666666665, ans=0.2 2023-11-26 21:54:19,288 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 538250 2023-11-26 21:54:23,516 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 9200, loss[loss=0.07224, simple_loss=0.08969, pruned_loss=0.01915, audio_tagging_loss=0.00825, over 13903.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.09017, pruned_loss=0.01218, audio_tagging_loss=0.008573, over 3053049.62 frames. ], batch size: 53, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:54:42,044 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.381e+01 8.859e+01 9.629e+01 1.034e+02 1.503e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-26 21:55:15,040 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 538300 2023-11-26 21:55:19,686 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 9250, loss[loss=0.06737, simple_loss=0.09384, pruned_loss=0.01152, audio_tagging_loss=0.008939, over 15292.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.08999, pruned_loss=0.01214, audio_tagging_loss=0.008608, over 3057330.44 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:55:22,500 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3588680.0, ans=0.5 2023-11-26 21:55:25,988 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.67 vs. limit=6.0 2023-11-26 21:55:51,232 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.40 vs. limit=22.5 2023-11-26 21:55:54,888 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3588880.0, ans=0.1 2023-11-26 21:56:10,490 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3588946.6666666665, ans=0.0 2023-11-26 21:56:11,292 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 538350 2023-11-26 21:56:15,967 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 9300, loss[loss=0.06055, simple_loss=0.08168, pruned_loss=0.009235, audio_tagging_loss=0.01048, over 14793.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08891, pruned_loss=0.012, audio_tagging_loss=0.008685, over 3055743.17 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:56:17,460 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.08 vs. limit=22.5 2023-11-26 21:56:19,634 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.18 vs. limit=15.0 2023-11-26 21:56:32,741 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.167e+01 8.799e+01 9.431e+01 1.003e+02 1.401e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-26 21:56:34,066 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3589080.0, ans=0.1 2023-11-26 21:56:38,878 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.99 vs. limit=22.5 2023-11-26 21:56:43,166 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3589146.6666666665, ans=0.0 2023-11-26 21:57:07,052 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 538400 2023-11-26 21:57:07,119 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3589280.0, ans=0.0 2023-11-26 21:57:07,222 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 21:57:07,322 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3589280.0, ans=0.1 2023-11-26 21:57:11,536 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 9350, loss[loss=0.08529, simple_loss=0.1129, pruned_loss=0.02124, audio_tagging_loss=0.007587, over 15687.00 frames. ], tot_loss[loss=0.0651, simple_loss=0.08895, pruned_loss=0.01204, audio_tagging_loss=0.008594, over 3057450.19 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:57:12,707 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3589346.6666666665, ans=0.125 2023-11-26 21:57:16,033 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3589346.6666666665, ans=0.125 2023-11-26 21:57:19,211 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3589346.6666666665, ans=0.1 2023-11-26 21:57:22,582 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3589413.3333333335, ans=0.125 2023-11-26 21:57:44,479 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3589546.6666666665, ans=0.125 2023-11-26 21:58:01,279 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3589613.3333333335, ans=0.07 2023-11-26 21:58:02,233 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 538450 2023-11-26 21:58:02,363 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3589613.3333333335, ans=0.125 2023-11-26 21:58:06,457 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 9400, loss[loss=0.07024, simple_loss=0.1075, pruned_loss=0.009741, audio_tagging_loss=0.006772, over 15625.00 frames. ], tot_loss[loss=0.06485, simple_loss=0.08856, pruned_loss=0.01188, audio_tagging_loss=0.008696, over 3058729.47 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:58:07,145 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.38 vs. limit=15.0 2023-11-26 21:58:23,155 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3589746.6666666665, ans=0.0 2023-11-26 21:58:25,067 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.062e+01 9.009e+01 9.595e+01 1.056e+02 1.388e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-26 21:58:26,358 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3589746.6666666665, ans=0.1 2023-11-26 21:58:30,643 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3589813.3333333335, ans=0.125 2023-11-26 21:58:58,866 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 538500 2023-11-26 21:59:03,564 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 9450, loss[loss=0.06309, simple_loss=0.08014, pruned_loss=0.01354, audio_tagging_loss=0.009477, over 15305.00 frames. ], tot_loss[loss=0.06472, simple_loss=0.08811, pruned_loss=0.01179, audio_tagging_loss=0.008871, over 3056756.36 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:59:03,593 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 21:59:17,980 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.01 vs. limit=12.0 2023-11-26 21:59:19,867 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3590080.0, ans=0.1 2023-11-26 21:59:25,296 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3590146.6666666665, ans=0.125 2023-11-26 21:59:36,015 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3590213.3333333335, ans=0.125 2023-11-26 21:59:37,673 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3590213.3333333335, ans=0.125 2023-11-26 21:59:37,733 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3590213.3333333335, ans=0.0 2023-11-26 21:59:45,684 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3590213.3333333335, ans=0.0 2023-11-26 21:59:52,394 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.98 vs. limit=15.0 2023-11-26 21:59:55,075 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 538550 2023-11-26 21:59:59,324 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 9500, loss[loss=0.06267, simple_loss=0.07771, pruned_loss=0.01456, audio_tagging_loss=0.009258, over 14887.00 frames. ], tot_loss[loss=0.06485, simple_loss=0.08827, pruned_loss=0.01182, audio_tagging_loss=0.008898, over 3057251.93 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 22:00:15,548 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3590413.3333333335, ans=0.125 2023-11-26 22:00:18,519 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.889e+01 9.000e+01 9.693e+01 1.049e+02 2.337e+02, threshold=1.939e+02, percent-clipped=1.0 2023-11-26 22:00:22,474 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3590480.0, ans=0.125 2023-11-26 22:00:30,511 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3590480.0, ans=0.125 2023-11-26 22:00:38,157 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3590546.6666666665, ans=0.035 2023-11-26 22:00:50,782 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 538600 2023-11-26 22:00:55,219 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 9550, loss[loss=0.05545, simple_loss=0.07586, pruned_loss=0.007282, audio_tagging_loss=0.01024, over 14357.00 frames. ], tot_loss[loss=0.06443, simple_loss=0.0875, pruned_loss=0.01173, audio_tagging_loss=0.008946, over 3051034.36 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 22:01:00,459 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.58 vs. limit=15.0 2023-11-26 22:01:01,791 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3590680.0, ans=0.0 2023-11-26 22:01:06,625 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3590746.6666666665, ans=0.125 2023-11-26 22:01:12,544 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3590746.6666666665, ans=0.05 2023-11-26 22:01:29,640 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3590880.0, ans=0.05 2023-11-26 22:01:31,653 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3590880.0, ans=0.125 2023-11-26 22:01:32,683 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3590880.0, ans=0.0 2023-11-26 22:01:42,871 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3590946.6666666665, ans=0.2 2023-11-26 22:01:46,456 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3590946.6666666665, ans=0.125 2023-11-26 22:01:47,410 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 538650 2023-11-26 22:01:52,744 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 9600, loss[loss=0.06758, simple_loss=0.09697, pruned_loss=0.01033, audio_tagging_loss=0.008755, over 15720.00 frames. ], tot_loss[loss=0.06402, simple_loss=0.08711, pruned_loss=0.0115, audio_tagging_loss=0.00897, over 3039527.46 frames. ], batch size: 61, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 22:02:00,263 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3591013.3333333335, ans=0.0 2023-11-26 22:02:01,301 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3591013.3333333335, ans=0.125 2023-11-26 22:02:04,652 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.48 vs. limit=15.0 2023-11-26 22:02:10,547 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.211e+01 8.846e+01 9.558e+01 1.014e+02 1.385e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-26 22:02:18,205 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3591146.6666666665, ans=0.125 2023-11-26 22:02:40,622 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3591280.0, ans=0.125 2023-11-26 22:02:40,649 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3591280.0, ans=0.125 2023-11-26 22:02:41,786 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3591280.0, ans=0.0 2023-11-26 22:02:41,825 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3591280.0, ans=0.125 2023-11-26 22:02:43,708 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 538700 2023-11-26 22:02:47,915 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 9650, loss[loss=0.05809, simple_loss=0.06857, pruned_loss=0.01155, audio_tagging_loss=0.01225, over 14116.00 frames. ], tot_loss[loss=0.06408, simple_loss=0.08714, pruned_loss=0.01155, audio_tagging_loss=0.008958, over 3034187.97 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 22:02:53,499 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3591346.6666666665, ans=0.125 2023-11-26 22:02:55,045 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.44 vs. limit=22.5 2023-11-26 22:02:58,179 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.21 vs. limit=12.0 2023-11-26 22:03:03,976 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3591413.3333333335, ans=0.0 2023-11-26 22:03:04,015 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3591413.3333333335, ans=0.125 2023-11-26 22:03:06,038 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3591413.3333333335, ans=0.1 2023-11-26 22:03:13,419 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3591480.0, ans=0.125 2023-11-26 22:03:25,277 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.30 vs. limit=15.0 2023-11-26 22:03:31,215 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3591613.3333333335, ans=0.125 2023-11-26 22:03:38,574 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 538750 2023-11-26 22:03:41,909 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3591680.0, ans=0.125 2023-11-26 22:03:42,860 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 9700, loss[loss=0.05628, simple_loss=0.07833, pruned_loss=0.009559, audio_tagging_loss=0.007553, over 15492.00 frames. ], tot_loss[loss=0.0637, simple_loss=0.08699, pruned_loss=0.01141, audio_tagging_loss=0.00879, over 3036227.13 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 22:03:46,142 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3591680.0, ans=0.125 2023-11-26 22:03:55,987 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3591746.6666666665, ans=0.1 2023-11-26 22:04:02,690 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.048e+01 8.825e+01 9.473e+01 1.012e+02 1.378e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-26 22:04:03,029 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3591746.6666666665, ans=0.125 2023-11-26 22:04:22,065 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3591880.0, ans=0.125 2023-11-26 22:04:34,594 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 538800 2023-11-26 22:04:39,053 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 9750, loss[loss=0.06902, simple_loss=0.09952, pruned_loss=0.01103, audio_tagging_loss=0.008231, over 15434.00 frames. ], tot_loss[loss=0.06351, simple_loss=0.08696, pruned_loss=0.01136, audio_tagging_loss=0.008678, over 3045100.06 frames. ], batch size: 54, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:05:02,381 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3592146.6666666665, ans=0.125 2023-11-26 22:05:04,627 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3592146.6666666665, ans=0.125 2023-11-26 22:05:16,723 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3592213.3333333335, ans=0.125 2023-11-26 22:05:18,874 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3592213.3333333335, ans=0.125 2023-11-26 22:05:21,010 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3592213.3333333335, ans=0.2 2023-11-26 22:05:22,201 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.56 vs. limit=12.0 2023-11-26 22:05:30,190 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 538850 2023-11-26 22:05:31,329 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3592280.0, ans=0.0 2023-11-26 22:05:34,340 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 9800, loss[loss=0.0428, simple_loss=0.05573, pruned_loss=0.005708, audio_tagging_loss=0.009224, over 15439.00 frames. ], tot_loss[loss=0.06449, simple_loss=0.08831, pruned_loss=0.01167, audio_tagging_loss=0.008659, over 3043791.11 frames. ], batch size: 60, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:05:44,145 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3592413.3333333335, ans=0.0 2023-11-26 22:05:46,288 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3592413.3333333335, ans=0.2 2023-11-26 22:05:52,321 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.272e+01 8.733e+01 9.432e+01 1.005e+02 1.366e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-26 22:06:00,099 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 22:06:03,700 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.77 vs. limit=6.0 2023-11-26 22:06:08,085 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3592546.6666666665, ans=0.0 2023-11-26 22:06:13,240 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.16 vs. limit=15.0 2023-11-26 22:06:16,360 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3592546.6666666665, ans=0.125 2023-11-26 22:06:25,737 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 22:06:25,790 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 538900 2023-11-26 22:06:28,456 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.62 vs. limit=15.0 2023-11-26 22:06:29,955 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 9850, loss[loss=0.0526, simple_loss=0.07171, pruned_loss=0.008292, audio_tagging_loss=0.008451, over 15491.00 frames. ], tot_loss[loss=0.06493, simple_loss=0.08912, pruned_loss=0.01174, audio_tagging_loss=0.008624, over 3049859.03 frames. ], batch size: 57, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:06:44,337 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.86 vs. limit=15.0 2023-11-26 22:07:12,150 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.01 vs. limit=6.0 2023-11-26 22:07:18,852 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.39 vs. limit=15.0 2023-11-26 22:07:21,330 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 538950 2023-11-26 22:07:26,033 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 9900, loss[loss=0.07391, simple_loss=0.1041, pruned_loss=0.01252, audio_tagging_loss=0.009321, over 14395.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.09054, pruned_loss=0.01198, audio_tagging_loss=0.008565, over 3051205.34 frames. ], batch size: 53, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:07:34,273 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3593013.3333333335, ans=0.2 2023-11-26 22:07:34,295 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3593013.3333333335, ans=0.0 2023-11-26 22:07:35,445 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3593013.3333333335, ans=0.125 2023-11-26 22:07:41,007 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3593080.0, ans=0.125 2023-11-26 22:07:44,233 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3593080.0, ans=0.1 2023-11-26 22:07:45,057 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.355e+01 9.069e+01 9.666e+01 1.030e+02 3.243e+02, threshold=1.933e+02, percent-clipped=1.0 2023-11-26 22:07:55,935 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 22:07:59,051 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3593213.3333333335, ans=0.0 2023-11-26 22:08:16,898 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 539000 2023-11-26 22:08:21,870 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 9950, loss[loss=0.04172, simple_loss=0.05211, pruned_loss=0.005934, audio_tagging_loss=0.009731, over 14705.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.09028, pruned_loss=0.01198, audio_tagging_loss=0.008604, over 3048736.63 frames. ], batch size: 56, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:08:44,357 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3593480.0, ans=0.0 2023-11-26 22:09:01,821 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3593546.6666666665, ans=0.2 2023-11-26 22:09:12,880 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 539050 2023-11-26 22:09:14,035 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3593613.3333333335, ans=0.125 2023-11-26 22:09:14,074 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3593613.3333333335, ans=0.0 2023-11-26 22:09:17,158 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 10000, loss[loss=0.06293, simple_loss=0.07985, pruned_loss=0.01484, audio_tagging_loss=0.008167, over 15790.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08942, pruned_loss=0.01195, audio_tagging_loss=0.008614, over 3047973.78 frames. ], batch size: 59, lr: 1.49e-03, grad_scale: 32.0 2023-11-26 22:09:18,816 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.85 vs. limit=15.0 2023-11-26 22:09:35,523 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.167e+01 8.750e+01 9.330e+01 1.017e+02 1.273e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-26 22:09:40,738 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3593813.3333333335, ans=0.125 2023-11-26 22:09:56,128 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3593880.0, ans=0.2 2023-11-26 22:10:07,563 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 539100 2023-11-26 22:10:07,675 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3593946.6666666665, ans=0.125 2023-11-26 22:10:11,482 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3594013.3333333335, ans=0.125 2023-11-26 22:10:12,289 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 10050, loss[loss=0.0722, simple_loss=0.1045, pruned_loss=0.01149, audio_tagging_loss=0.008487, over 15195.00 frames. ], tot_loss[loss=0.06504, simple_loss=0.08883, pruned_loss=0.01202, audio_tagging_loss=0.008603, over 3050388.55 frames. ], batch size: 58, lr: 1.49e-03, grad_scale: 32.0 2023-11-26 22:10:17,951 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3594013.3333333335, ans=0.125 2023-11-26 22:10:18,283 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.67 vs. limit=15.0 2023-11-26 22:10:21,650 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3594013.3333333335, ans=0.0 2023-11-26 22:10:31,045 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.82 vs. limit=15.0 2023-11-26 22:10:39,144 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3594146.6666666665, ans=0.0 2023-11-26 22:11:02,036 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.25 vs. limit=15.0 2023-11-26 22:11:03,172 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 539150 2023-11-26 22:11:05,695 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.62 vs. limit=15.0 2023-11-26 22:11:07,346 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 10100, loss[loss=0.06301, simple_loss=0.07981, pruned_loss=0.01454, audio_tagging_loss=0.008564, over 14652.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08917, pruned_loss=0.01208, audio_tagging_loss=0.008702, over 3058101.56 frames. ], batch size: 54, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:11:27,040 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.124e+01 9.131e+01 9.595e+01 1.046e+02 1.257e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-26 22:11:28,399 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3594480.0, ans=0.125 2023-11-26 22:11:48,681 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 22:11:53,694 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 22:11:58,078 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.93 vs. limit=10.0 2023-11-26 22:11:58,608 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 539200 2023-11-26 22:12:03,084 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 10150, loss[loss=0.06528, simple_loss=0.08411, pruned_loss=0.01462, audio_tagging_loss=0.008596, over 14239.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.08995, pruned_loss=0.01214, audio_tagging_loss=0.008687, over 3056677.45 frames. ], batch size: 54, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:12:07,553 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3594680.0, ans=0.125 2023-11-26 22:12:14,422 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.12 vs. limit=15.0 2023-11-26 22:12:24,008 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3594813.3333333335, ans=0.125 2023-11-26 22:12:27,660 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 22:12:30,749 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 22:12:35,207 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3594880.0, ans=0.07 2023-11-26 22:12:48,774 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.28 vs. limit=6.0 2023-11-26 22:12:51,690 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3594946.6666666665, ans=0.07 2023-11-26 22:12:53,704 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 539250 2023-11-26 22:12:56,029 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3594946.6666666665, ans=0.125 2023-11-26 22:12:58,452 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 10200, loss[loss=0.06669, simple_loss=0.08072, pruned_loss=0.01559, audio_tagging_loss=0.01074, over 14988.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.09039, pruned_loss=0.01236, audio_tagging_loss=0.008686, over 3064050.79 frames. ], batch size: 57, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:13:02,719 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.54 vs. limit=10.0 2023-11-26 22:13:08,137 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.66 vs. limit=12.0 2023-11-26 22:13:18,613 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.716e+01 9.041e+01 9.563e+01 1.048e+02 1.575e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-26 22:13:20,762 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 22:13:49,489 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 539300 2023-11-26 22:13:54,228 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 10250, loss[loss=0.06562, simple_loss=0.08699, pruned_loss=0.01359, audio_tagging_loss=0.008538, over 16005.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.08972, pruned_loss=0.01225, audio_tagging_loss=0.008739, over 3057724.50 frames. ], batch size: 59, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:13:57,701 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3595346.6666666665, ans=0.1 2023-11-26 22:13:58,629 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3595346.6666666665, ans=0.0 2023-11-26 22:14:00,817 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3595346.6666666665, ans=0.125 2023-11-26 22:14:07,692 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3595413.3333333335, ans=0.125 2023-11-26 22:14:14,234 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3595413.3333333335, ans=0.125 2023-11-26 22:14:28,423 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.61 vs. limit=6.0 2023-11-26 22:14:29,616 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.49 vs. limit=15.0 2023-11-26 22:14:38,336 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3595613.3333333335, ans=0.125 2023-11-26 22:14:44,550 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3595613.3333333335, ans=0.125 2023-11-26 22:14:45,465 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 539350 2023-11-26 22:14:49,566 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 10300, loss[loss=0.06588, simple_loss=0.08772, pruned_loss=0.01249, audio_tagging_loss=0.009525, over 14873.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.08916, pruned_loss=0.01212, audio_tagging_loss=0.008775, over 3060762.77 frames. ], batch size: 56, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:15:00,937 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3595746.6666666665, ans=0.125 2023-11-26 22:15:04,593 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.92 vs. limit=22.5 2023-11-26 22:15:10,409 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.546e+01 9.184e+01 9.815e+01 1.071e+02 1.317e+02, threshold=1.963e+02, percent-clipped=0.0 2023-11-26 22:15:13,864 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3595813.3333333335, ans=0.0 2023-11-26 22:15:41,416 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 539400 2023-11-26 22:15:45,995 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 10350, loss[loss=0.05379, simple_loss=0.06794, pruned_loss=0.0108, audio_tagging_loss=0.009018, over 14871.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.08982, pruned_loss=0.01225, audio_tagging_loss=0.008773, over 3055937.36 frames. ], batch size: 56, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:16:00,601 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3596080.0, ans=0.2 2023-11-26 22:16:12,324 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3596146.6666666665, ans=0.125 2023-11-26 22:16:38,447 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 539450 2023-11-26 22:16:43,140 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 10400, loss[loss=0.05619, simple_loss=0.07483, pruned_loss=0.009473, audio_tagging_loss=0.0093, over 14218.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08913, pruned_loss=0.01207, audio_tagging_loss=0.00889, over 3051792.39 frames. ], batch size: 55, lr: 1.49e-03, grad_scale: 32.0 2023-11-26 22:16:46,529 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 22:17:02,390 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.615e+01 8.954e+01 9.594e+01 1.032e+02 1.312e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-26 22:17:13,059 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.65 vs. limit=22.5 2023-11-26 22:17:34,661 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 539500 2023-11-26 22:17:38,849 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 10450, loss[loss=0.05023, simple_loss=0.07115, pruned_loss=0.005382, audio_tagging_loss=0.009276, over 15450.00 frames. ], tot_loss[loss=0.06499, simple_loss=0.08853, pruned_loss=0.01188, audio_tagging_loss=0.008847, over 3050417.61 frames. ], batch size: 59, lr: 1.49e-03, grad_scale: 32.0 2023-11-26 22:17:43,699 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.40 vs. limit=15.0 2023-11-26 22:17:50,433 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.96 vs. limit=15.0 2023-11-26 22:18:04,067 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3596813.3333333335, ans=0.0 2023-11-26 22:18:10,999 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3596813.3333333335, ans=0.2 2023-11-26 22:18:22,787 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=3596946.6666666665, ans=15.0 2023-11-26 22:18:29,977 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 539550 2023-11-26 22:18:34,745 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 10500, loss[loss=0.06334, simple_loss=0.08257, pruned_loss=0.01048, audio_tagging_loss=0.01157, over 14052.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.08881, pruned_loss=0.01195, audio_tagging_loss=0.008778, over 3044597.41 frames. ], batch size: 55, lr: 1.49e-03, grad_scale: 32.0 2023-11-26 22:18:52,441 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3597080.0, ans=0.125 2023-11-26 22:18:53,782 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.83 vs. limit=15.0 2023-11-26 22:18:55,227 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.808e+01 8.749e+01 9.296e+01 1.026e+02 1.262e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-26 22:18:56,560 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3597146.6666666665, ans=0.0 2023-11-26 22:19:00,724 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3597146.6666666665, ans=0.0 2023-11-26 22:19:18,374 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3597280.0, ans=0.125 2023-11-26 22:19:20,395 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3597280.0, ans=0.0 2023-11-26 22:19:26,559 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 539600 2023-11-26 22:19:30,991 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 10550, loss[loss=0.06492, simple_loss=0.08425, pruned_loss=0.01386, audio_tagging_loss=0.008929, over 15351.00 frames. ], tot_loss[loss=0.06476, simple_loss=0.08869, pruned_loss=0.01181, audio_tagging_loss=0.008614, over 3046406.66 frames. ], batch size: 56, lr: 1.49e-03, grad_scale: 32.0 2023-11-26 22:19:32,284 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=3597346.6666666665, ans=0.025 2023-11-26 22:19:39,142 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 22:19:40,201 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3597346.6666666665, ans=0.125 2023-11-26 22:19:41,239 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3597413.3333333335, ans=0.0 2023-11-26 22:20:03,064 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3597546.6666666665, ans=0.0 2023-11-26 22:20:21,656 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3597613.3333333335, ans=0.0 2023-11-26 22:20:22,554 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 539650 2023-11-26 22:20:26,731 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 10600, loss[loss=0.06239, simple_loss=0.08836, pruned_loss=0.01248, audio_tagging_loss=0.005734, over 15317.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.09008, pruned_loss=0.0121, audio_tagging_loss=0.008411, over 3044491.15 frames. ], batch size: 56, lr: 1.49e-03, grad_scale: 32.0 2023-11-26 22:20:34,768 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.53 vs. limit=15.0 2023-11-26 22:20:45,783 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3597746.6666666665, ans=0.125 2023-11-26 22:20:47,039 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.182e+01 8.604e+01 9.125e+01 9.885e+01 1.207e+02, threshold=1.825e+02, percent-clipped=0.0 2023-11-26 22:20:50,927 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3597813.3333333335, ans=0.1 2023-11-26 22:20:50,943 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3597813.3333333335, ans=0.125 2023-11-26 22:20:53,112 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3597813.3333333335, ans=0.2 2023-11-26 22:21:17,969 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 539700 2023-11-26 22:21:22,183 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 10650, loss[loss=0.06988, simple_loss=0.09564, pruned_loss=0.01356, audio_tagging_loss=0.008493, over 15275.00 frames. ], tot_loss[loss=0.06456, simple_loss=0.08882, pruned_loss=0.01175, audio_tagging_loss=0.008393, over 3044333.71 frames. ], batch size: 57, lr: 1.49e-03, grad_scale: 32.0 2023-11-26 22:21:25,593 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3598013.3333333335, ans=0.125 2023-11-26 22:21:32,879 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.27 vs. limit=6.0 2023-11-26 22:21:37,104 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.47 vs. limit=6.0 2023-11-26 22:21:43,994 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.92 vs. limit=12.0 2023-11-26 22:22:04,649 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3598213.3333333335, ans=0.1 2023-11-26 22:22:14,151 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 539750 2023-11-26 22:22:15,813 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.50 vs. limit=15.0 2023-11-26 22:22:18,302 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 10700, loss[loss=0.05455, simple_loss=0.06793, pruned_loss=0.01051, audio_tagging_loss=0.01007, over 14912.00 frames. ], tot_loss[loss=0.06411, simple_loss=0.08781, pruned_loss=0.0117, audio_tagging_loss=0.008504, over 3040847.25 frames. ], batch size: 57, lr: 1.49e-03, grad_scale: 32.0 2023-11-26 22:22:20,681 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3598346.6666666665, ans=10.0 2023-11-26 22:22:23,267 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3598346.6666666665, ans=0.04949747468305833 2023-11-26 22:22:27,771 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=3598346.6666666665, ans=15.0 2023-11-26 22:22:37,961 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.681e+01 8.850e+01 9.452e+01 1.010e+02 1.228e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-26 22:22:42,036 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.26 vs. limit=5.0 2023-11-26 22:22:51,818 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3598546.6666666665, ans=0.1 2023-11-26 22:23:00,881 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 22:23:09,824 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 539800 2023-11-26 22:23:10,025 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3598613.3333333335, ans=0.125 2023-11-26 22:23:12,304 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3598613.3333333335, ans=0.125 2023-11-26 22:23:14,297 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 10750, loss[loss=0.07383, simple_loss=0.1018, pruned_loss=0.01282, audio_tagging_loss=0.0101, over 16060.00 frames. ], tot_loss[loss=0.06479, simple_loss=0.08876, pruned_loss=0.01188, audio_tagging_loss=0.00853, over 3044827.74 frames. ], batch size: 59, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:23:25,028 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 22:23:26,121 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3598746.6666666665, ans=0.125 2023-11-26 22:23:26,158 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3598746.6666666665, ans=0.0 2023-11-26 22:23:32,611 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.64 vs. limit=22.5 2023-11-26 22:23:40,478 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3598813.3333333335, ans=0.125 2023-11-26 22:23:59,101 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3598946.6666666665, ans=0.2 2023-11-26 22:24:00,204 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3598946.6666666665, ans=0.1 2023-11-26 22:24:05,257 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 539850 2023-11-26 22:24:07,938 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.98 vs. limit=10.0 2023-11-26 22:24:09,442 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 10800, loss[loss=0.05067, simple_loss=0.06617, pruned_loss=0.005681, audio_tagging_loss=0.0119, over 15526.00 frames. ], tot_loss[loss=0.06475, simple_loss=0.08859, pruned_loss=0.01184, audio_tagging_loss=0.008618, over 3052717.14 frames. ], batch size: 60, lr: 1.49e-03, grad_scale: 32.0 2023-11-26 22:24:14,830 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3599013.3333333335, ans=0.125 2023-11-26 22:24:31,740 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.551e+01 8.827e+01 9.312e+01 1.017e+02 1.289e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-26 22:24:43,704 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3599213.3333333335, ans=0.125 2023-11-26 22:25:01,009 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 539900 2023-11-26 22:25:06,353 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 10850, loss[loss=0.06701, simple_loss=0.08471, pruned_loss=0.01463, audio_tagging_loss=0.01003, over 14664.00 frames. ], tot_loss[loss=0.06503, simple_loss=0.08881, pruned_loss=0.01206, audio_tagging_loss=0.008564, over 3057876.71 frames. ], batch size: 57, lr: 1.49e-03, grad_scale: 32.0 2023-11-26 22:25:10,766 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3599346.6666666665, ans=0.125 2023-11-26 22:25:17,735 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3599413.3333333335, ans=0.125 2023-11-26 22:25:19,815 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3599413.3333333335, ans=0.0 2023-11-26 22:25:19,838 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3599413.3333333335, ans=0.125 2023-11-26 22:25:33,140 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.58 vs. limit=10.0 2023-11-26 22:25:38,965 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3599546.6666666665, ans=0.125 2023-11-26 22:25:57,971 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 539950 2023-11-26 22:25:58,174 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3599613.3333333335, ans=0.0 2023-11-26 22:26:00,073 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 22:26:02,140 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 10900, loss[loss=0.05154, simple_loss=0.06747, pruned_loss=0.008303, audio_tagging_loss=0.009499, over 15536.00 frames. ], tot_loss[loss=0.06471, simple_loss=0.08812, pruned_loss=0.01197, audio_tagging_loss=0.008681, over 3049507.30 frames. ], batch size: 62, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:26:23,454 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.190e+01 9.085e+01 9.626e+01 1.024e+02 1.281e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-26 22:26:32,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3599813.3333333335, ans=0.0 2023-11-26 22:26:47,151 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3599946.6666666665, ans=0.125 2023-11-26 22:26:53,219 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 540000 2023-11-26 22:26:54,527 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-540000.pt 2023-11-26 22:26:59,562 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 10950, loss[loss=0.06723, simple_loss=0.09512, pruned_loss=0.01356, audio_tagging_loss=0.006116, over 15635.00 frames. ], tot_loss[loss=0.06492, simple_loss=0.0883, pruned_loss=0.01204, audio_tagging_loss=0.008722, over 3050797.11 frames. ], batch size: 59, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:27:22,614 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3600146.6666666665, ans=0.0 2023-11-26 22:27:24,152 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3600146.6666666665, ans=0.0 2023-11-26 22:27:35,170 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.44 vs. limit=22.5 2023-11-26 22:27:38,897 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3600213.3333333335, ans=0.125 2023-11-26 22:27:43,235 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3600280.0, ans=0.125 2023-11-26 22:27:50,445 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 540050 2023-11-26 22:27:55,675 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 11000, loss[loss=0.05102, simple_loss=0.07058, pruned_loss=0.008024, audio_tagging_loss=0.007711, over 16103.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.0891, pruned_loss=0.01215, audio_tagging_loss=0.008728, over 3048838.39 frames. ], batch size: 61, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:28:07,292 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 22:28:14,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3600413.3333333335, ans=0.125 2023-11-26 22:28:17,881 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.702e+01 8.990e+01 9.480e+01 9.957e+01 3.729e+02, threshold=1.896e+02, percent-clipped=1.0 2023-11-26 22:28:18,221 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3600480.0, ans=0.125 2023-11-26 22:28:27,775 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3600546.6666666665, ans=0.09899494936611666 2023-11-26 22:28:29,869 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3600546.6666666665, ans=0.0 2023-11-26 22:28:43,224 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.93 vs. limit=10.0 2023-11-26 22:28:47,278 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 540100 2023-11-26 22:28:52,034 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 11050, loss[loss=0.07217, simple_loss=0.1029, pruned_loss=0.01115, audio_tagging_loss=0.009573, over 15583.00 frames. ], tot_loss[loss=0.06528, simple_loss=0.08873, pruned_loss=0.01213, audio_tagging_loss=0.008786, over 3041653.48 frames. ], batch size: 56, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:29:16,953 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.10 vs. limit=6.0 2023-11-26 22:29:18,069 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3600813.3333333335, ans=0.125 2023-11-26 22:29:26,600 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.91 vs. limit=15.0 2023-11-26 22:29:34,573 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.35 vs. limit=22.5 2023-11-26 22:29:37,299 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3600946.6666666665, ans=0.125 2023-11-26 22:29:42,422 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 540150 2023-11-26 22:29:44,699 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3600946.6666666665, ans=0.0 2023-11-26 22:29:46,544 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 11100, loss[loss=0.07113, simple_loss=0.09324, pruned_loss=0.01389, audio_tagging_loss=0.01062, over 15186.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.08837, pruned_loss=0.01229, audio_tagging_loss=0.008908, over 3041184.08 frames. ], batch size: 58, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:30:08,733 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.991e+01 9.032e+01 9.689e+01 1.034e+02 1.564e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-26 22:30:31,197 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3601280.0, ans=0.125 2023-11-26 22:30:35,441 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3601280.0, ans=0.125 2023-11-26 22:30:37,331 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 540200 2023-11-26 22:30:42,448 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 11150, loss[loss=0.06584, simple_loss=0.09074, pruned_loss=0.01313, audio_tagging_loss=0.00734, over 14810.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08807, pruned_loss=0.01221, audio_tagging_loss=0.008992, over 3045539.70 frames. ], batch size: 55, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:30:49,523 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3601346.6666666665, ans=0.2 2023-11-26 22:30:50,940 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.06 vs. limit=12.0 2023-11-26 22:31:01,391 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3601413.3333333335, ans=0.125 2023-11-26 22:31:06,530 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3601480.0, ans=0.125 2023-11-26 22:31:19,192 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3601546.6666666665, ans=0.1 2023-11-26 22:31:33,874 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 540250 2023-11-26 22:31:38,612 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 11200, loss[loss=0.05507, simple_loss=0.07246, pruned_loss=0.01078, audio_tagging_loss=0.008057, over 14205.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08823, pruned_loss=0.01214, audio_tagging_loss=0.008934, over 3049425.44 frames. ], batch size: 54, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:32:01,320 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.000e+01 8.768e+01 9.515e+01 1.029e+02 1.320e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-26 22:32:03,702 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3601813.3333333335, ans=0.2 2023-11-26 22:32:14,495 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3601880.0, ans=0.125 2023-11-26 22:32:16,810 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.70 vs. limit=22.5 2023-11-26 22:32:21,741 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.99 vs. limit=10.0 2023-11-26 22:32:30,199 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 540300 2023-11-26 22:32:34,415 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 11250, loss[loss=0.06492, simple_loss=0.09025, pruned_loss=0.01158, audio_tagging_loss=0.008217, over 15610.00 frames. ], tot_loss[loss=0.06405, simple_loss=0.08649, pruned_loss=0.01181, audio_tagging_loss=0.009003, over 3045354.70 frames. ], batch size: 56, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:32:35,716 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3602013.3333333335, ans=0.125 2023-11-26 22:32:37,309 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.78 vs. limit=10.0 2023-11-26 22:32:38,389 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.86 vs. limit=6.0 2023-11-26 22:32:45,254 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3602080.0, ans=0.125 2023-11-26 22:32:46,297 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3602080.0, ans=0.0 2023-11-26 22:33:21,812 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.23 vs. limit=15.0 2023-11-26 22:33:25,523 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 540350 2023-11-26 22:33:26,791 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3602280.0, ans=0.025 2023-11-26 22:33:29,754 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 11300, loss[loss=0.05655, simple_loss=0.078, pruned_loss=0.009646, audio_tagging_loss=0.007909, over 16283.00 frames. ], tot_loss[loss=0.06426, simple_loss=0.08729, pruned_loss=0.01184, audio_tagging_loss=0.008779, over 3048831.28 frames. ], batch size: 63, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:33:32,696 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3602346.6666666665, ans=0.125 2023-11-26 22:33:52,570 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.26 vs. limit=12.0 2023-11-26 22:33:54,061 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.703e+01 8.683e+01 9.336e+01 1.007e+02 1.340e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-26 22:33:56,457 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3602480.0, ans=0.2 2023-11-26 22:33:56,462 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3602480.0, ans=0.0 2023-11-26 22:34:21,800 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 540400 2023-11-26 22:34:26,373 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 11350, loss[loss=0.05597, simple_loss=0.07427, pruned_loss=0.009372, audio_tagging_loss=0.009456, over 14759.00 frames. ], tot_loss[loss=0.0642, simple_loss=0.08759, pruned_loss=0.01175, audio_tagging_loss=0.008659, over 3046677.64 frames. ], batch size: 56, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:34:33,880 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.53 vs. limit=22.5 2023-11-26 22:34:37,645 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.57 vs. limit=12.0 2023-11-26 22:34:45,640 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3602746.6666666665, ans=0.125 2023-11-26 22:34:49,893 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3602813.3333333335, ans=0.0 2023-11-26 22:34:52,020 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3602813.3333333335, ans=0.0 2023-11-26 22:35:04,919 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.56 vs. limit=15.0 2023-11-26 22:35:06,747 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3602880.0, ans=0.125 2023-11-26 22:35:16,057 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3602946.6666666665, ans=0.0 2023-11-26 22:35:17,901 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 540450 2023-11-26 22:35:22,601 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 11400, loss[loss=0.05014, simple_loss=0.06827, pruned_loss=0.008513, audio_tagging_loss=0.007489, over 15727.00 frames. ], tot_loss[loss=0.0644, simple_loss=0.08809, pruned_loss=0.01175, audio_tagging_loss=0.008605, over 3057067.01 frames. ], batch size: 61, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:35:25,825 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3603013.3333333335, ans=0.125 2023-11-26 22:35:29,488 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.08 vs. limit=22.5 2023-11-26 22:35:31,756 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=3603013.3333333335, ans=15.0 2023-11-26 22:35:35,675 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3603080.0, ans=0.125 2023-11-26 22:35:46,002 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.908e+01 8.997e+01 9.516e+01 1.035e+02 1.684e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-26 22:35:49,300 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3603146.6666666665, ans=0.125 2023-11-26 22:35:55,210 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3603213.3333333335, ans=0.025 2023-11-26 22:36:06,498 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.79 vs. limit=22.5 2023-11-26 22:36:13,496 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 540500 2023-11-26 22:36:17,738 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 11450, loss[loss=0.05231, simple_loss=0.07166, pruned_loss=0.008552, audio_tagging_loss=0.007931, over 14579.00 frames. ], tot_loss[loss=0.06475, simple_loss=0.08851, pruned_loss=0.01188, audio_tagging_loss=0.008609, over 3046596.04 frames. ], batch size: 56, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:36:22,783 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3603346.6666666665, ans=0.0 2023-11-26 22:36:23,069 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.87 vs. limit=15.0 2023-11-26 22:36:25,900 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.21 vs. limit=22.5 2023-11-26 22:36:37,638 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3603413.3333333335, ans=0.1 2023-11-26 22:36:52,278 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.98 vs. limit=6.0 2023-11-26 22:37:09,942 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 540550 2023-11-26 22:37:14,185 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 11500, loss[loss=0.0817, simple_loss=0.1115, pruned_loss=0.01637, audio_tagging_loss=0.00956, over 15323.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08942, pruned_loss=0.01204, audio_tagging_loss=0.008522, over 3045587.01 frames. ], batch size: 59, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:37:14,322 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3603680.0, ans=0.125 2023-11-26 22:37:19,093 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3603680.0, ans=0.125 2023-11-26 22:37:20,159 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3603680.0, ans=0.2 2023-11-26 22:37:28,672 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 22:37:32,197 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3603746.6666666665, ans=0.2 2023-11-26 22:37:37,303 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.778e+01 8.979e+01 9.575e+01 1.038e+02 1.869e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-26 22:37:40,758 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3603813.3333333335, ans=0.1 2023-11-26 22:37:40,790 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3603813.3333333335, ans=0.125 2023-11-26 22:38:02,814 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.33 vs. limit=10.0 2023-11-26 22:38:05,443 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 540600 2023-11-26 22:38:06,961 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3603946.6666666665, ans=0.0 2023-11-26 22:38:09,897 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 11550, loss[loss=0.07315, simple_loss=0.0943, pruned_loss=0.01545, audio_tagging_loss=0.01055, over 16017.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.08961, pruned_loss=0.0121, audio_tagging_loss=0.008505, over 3045959.33 frames. ], batch size: 59, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:38:41,489 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff2.min_abs, batch_count=3604146.6666666665, ans=0.1 2023-11-26 22:38:46,079 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 22:38:48,332 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3604213.3333333335, ans=0.2 2023-11-26 22:39:01,602 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 540650 2023-11-26 22:39:05,817 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 11600, loss[loss=0.06632, simple_loss=0.08455, pruned_loss=0.01431, audio_tagging_loss=0.009732, over 15243.00 frames. ], tot_loss[loss=0.06494, simple_loss=0.08869, pruned_loss=0.01201, audio_tagging_loss=0.008582, over 3049845.80 frames. ], batch size: 60, lr: 1.49e-03, grad_scale: 32.0 2023-11-26 22:39:30,778 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.533e+01 8.938e+01 9.802e+01 1.035e+02 1.553e+02, threshold=1.960e+02, percent-clipped=0.0 2023-11-26 22:39:55,435 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=3604613.3333333335, ans=0.1 2023-11-26 22:39:57,444 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 540700 2023-11-26 22:40:02,169 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 11650, loss[loss=0.06119, simple_loss=0.07423, pruned_loss=0.01324, audio_tagging_loss=0.01084, over 14411.00 frames. ], tot_loss[loss=0.0646, simple_loss=0.08816, pruned_loss=0.01188, audio_tagging_loss=0.008641, over 3042705.11 frames. ], batch size: 57, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:40:02,325 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3604680.0, ans=0.1 2023-11-26 22:40:03,392 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3604680.0, ans=0.125 2023-11-26 22:40:05,954 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.30 vs. limit=10.0 2023-11-26 22:40:17,801 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3604746.6666666665, ans=0.125 2023-11-26 22:40:27,969 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3604813.3333333335, ans=0.015 2023-11-26 22:40:29,176 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3604813.3333333335, ans=0.125 2023-11-26 22:40:32,287 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3604813.3333333335, ans=0.125 2023-11-26 22:40:33,342 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3604813.3333333335, ans=0.125 2023-11-26 22:40:39,231 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 22:40:52,020 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3604946.6666666665, ans=0.0 2023-11-26 22:40:53,892 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 540750 2023-11-26 22:40:58,075 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 11700, loss[loss=0.07586, simple_loss=0.1042, pruned_loss=0.01656, audio_tagging_loss=0.007206, over 15958.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.08907, pruned_loss=0.01203, audio_tagging_loss=0.008598, over 3050067.99 frames. ], batch size: 59, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:40:58,340 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3605013.3333333335, ans=0.125 2023-11-26 22:41:23,204 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.360e+01 8.888e+01 9.584e+01 1.031e+02 1.555e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-26 22:41:23,458 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3605146.6666666665, ans=0.05 2023-11-26 22:41:44,214 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3605280.0, ans=0.125 2023-11-26 22:41:49,352 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 540800 2023-11-26 22:41:54,391 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 11750, loss[loss=0.07631, simple_loss=0.1126, pruned_loss=0.01463, audio_tagging_loss=0.00539, over 15459.00 frames. ], tot_loss[loss=0.06502, simple_loss=0.0888, pruned_loss=0.01202, audio_tagging_loss=0.008598, over 3050179.92 frames. ], batch size: 55, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:42:26,831 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3605546.6666666665, ans=0.125 2023-11-26 22:42:27,179 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.99 vs. limit=15.0 2023-11-26 22:42:27,874 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3605546.6666666665, ans=0.125 2023-11-26 22:42:31,086 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3605546.6666666665, ans=0.125 2023-11-26 22:42:45,766 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 540850 2023-11-26 22:42:50,423 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 11800, loss[loss=0.06125, simple_loss=0.08714, pruned_loss=0.01107, audio_tagging_loss=0.006607, over 14084.00 frames. ], tot_loss[loss=0.06492, simple_loss=0.0885, pruned_loss=0.012, audio_tagging_loss=0.008663, over 3050450.59 frames. ], batch size: 52, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:42:51,753 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3605680.0, ans=0.125 2023-11-26 22:43:02,808 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3605746.6666666665, ans=0.125 2023-11-26 22:43:14,287 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.320e+01 8.659e+01 9.498e+01 1.042e+02 1.310e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-26 22:43:19,716 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3605813.3333333335, ans=0.1 2023-11-26 22:43:39,585 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3605946.6666666665, ans=0.125 2023-11-26 22:43:42,109 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 540900 2023-11-26 22:43:46,314 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 11850, loss[loss=0.05911, simple_loss=0.07896, pruned_loss=0.01214, audio_tagging_loss=0.007486, over 15148.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08907, pruned_loss=0.01221, audio_tagging_loss=0.008704, over 3049701.66 frames. ], batch size: 56, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:43:51,622 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3606013.3333333335, ans=0.1 2023-11-26 22:43:51,776 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3606013.3333333335, ans=0.125 2023-11-26 22:44:05,467 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.48 vs. limit=15.0 2023-11-26 22:44:09,107 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3606146.6666666665, ans=0.1 2023-11-26 22:44:09,534 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.34 vs. limit=15.0 2023-11-26 22:44:21,102 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3606213.3333333335, ans=0.125 2023-11-26 22:44:22,168 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3606213.3333333335, ans=0.2 2023-11-26 22:44:36,512 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3606280.0, ans=0.05 2023-11-26 22:44:37,466 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 540950 2023-11-26 22:44:41,651 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 11900, loss[loss=0.07219, simple_loss=0.1007, pruned_loss=0.01401, audio_tagging_loss=0.00781, over 16012.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08905, pruned_loss=0.01204, audio_tagging_loss=0.008667, over 3047828.82 frames. ], batch size: 59, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:44:41,923 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3606346.6666666665, ans=0.125 2023-11-26 22:44:46,056 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 22:44:47,057 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3606346.6666666665, ans=0.125 2023-11-26 22:45:01,440 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3606413.3333333335, ans=0.0 2023-11-26 22:45:06,171 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3606480.0, ans=0.0 2023-11-26 22:45:07,012 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.637e+01 8.979e+01 9.678e+01 1.018e+02 1.926e+02, threshold=1.936e+02, percent-clipped=1.0 2023-11-26 22:45:23,310 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3606546.6666666665, ans=0.0 2023-11-26 22:45:23,369 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3606546.6666666665, ans=0.125 2023-11-26 22:45:30,099 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.39 vs. limit=6.0 2023-11-26 22:45:33,265 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 541000 2023-11-26 22:45:38,254 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 11950, loss[loss=0.0546, simple_loss=0.08055, pruned_loss=0.007044, audio_tagging_loss=0.007284, over 15323.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08937, pruned_loss=0.0121, audio_tagging_loss=0.008847, over 3056290.55 frames. ], batch size: 57, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:45:39,463 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3606680.0, ans=0.0 2023-11-26 22:45:46,858 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3606680.0, ans=0.125 2023-11-26 22:45:48,512 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3606746.6666666665, ans=0.0 2023-11-26 22:45:59,459 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.45 vs. limit=15.0 2023-11-26 22:46:25,113 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3606946.6666666665, ans=0.1 2023-11-26 22:46:27,979 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 541050 2023-11-26 22:46:31,995 INFO [train_asr.py:1235] (0/4) Epoch 45, batch 12000, loss[loss=0.04683, simple_loss=0.05811, pruned_loss=0.004189, audio_tagging_loss=0.01358, over 14576.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08884, pruned_loss=0.01192, audio_tagging_loss=0.009044, over 3055163.90 frames. ], batch size: 56, lr: 1.49e-03, grad_scale: 32.0 2023-11-26 22:46:31,997 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-26 22:47:04,441 INFO [train_asr.py:1267] (0/4) Epoch 45, validation: loss=0.05747, simple_loss=0.05048, pruned_loss=0.005268, audio_tagging_loss=0.02696, over 4681554.00 frames. 2023-11-26 22:47:04,442 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-26 22:47:06,695 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3607013.3333333335, ans=0.125 2023-11-26 22:47:13,826 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3607080.0, ans=0.125 2023-11-26 22:47:15,907 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3607080.0, ans=0.125 2023-11-26 22:47:26,928 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.640e+01 9.102e+01 9.829e+01 1.057e+02 1.323e+02, threshold=1.966e+02, percent-clipped=0.0 2023-11-26 22:47:27,412 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.12 vs. limit=15.0 2023-11-26 22:47:28,432 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.86 vs. limit=12.0 2023-11-26 22:47:32,107 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/epoch-45.pt 2023-11-26 22:48:01,627 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3607186.6666666665, ans=0.0 2023-11-26 22:48:01,828 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.10 vs. limit=15.0 2023-11-26 22:48:02,328 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 0, loss[loss=0.05656, simple_loss=0.05759, pruned_loss=0.007288, audio_tagging_loss=0.02048, over 15198.00 frames. ], tot_loss[loss=0.05656, simple_loss=0.05759, pruned_loss=0.007288, audio_tagging_loss=0.02048, over 15198.00 frames. ], batch size: 58, lr: 1.48e-03, grad_scale: 32.0 2023-11-26 22:48:02,330 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-26 22:48:13,057 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.9676, 5.4806, 5.8391, 5.2110], device='cuda:0') 2023-11-26 22:48:14,058 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([1.5485, 2.4875, 2.4002, 2.2297, 2.6424, 2.5262, 2.6798, 2.6125], device='cuda:0') 2023-11-26 22:48:32,905 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.5580, 2.4270, 4.3980, 3.2311], device='cuda:0') 2023-11-26 22:48:33,847 INFO [train_asr.py:1267] (0/4) Epoch 46, validation: loss=0.05779, simple_loss=0.05056, pruned_loss=0.005325, audio_tagging_loss=0.02718, over 4681554.00 frames. 2023-11-26 22:48:33,848 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-26 22:48:35,108 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3607186.6666666665, ans=0.125 2023-11-26 22:48:35,644 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.97 vs. limit=15.0 2023-11-26 22:48:36,211 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3607186.6666666665, ans=0.0 2023-11-26 22:48:50,734 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.53 vs. limit=15.0 2023-11-26 22:48:55,515 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 541100 2023-11-26 22:48:55,684 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3607320.0, ans=0.0 2023-11-26 22:49:04,402 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 22:49:05,464 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3607386.6666666665, ans=0.04949747468305833 2023-11-26 22:49:28,969 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 50, loss[loss=0.08949, simple_loss=0.1178, pruned_loss=0.01611, audio_tagging_loss=0.01448, over 15141.00 frames. ], tot_loss[loss=0.07208, simple_loss=0.08963, pruned_loss=0.01115, audio_tagging_loss=0.01612, over 693108.53 frames. ], batch size: 54, lr: 1.48e-03, grad_scale: 32.0 2023-11-26 22:49:45,845 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.36 vs. limit=15.0 2023-11-26 22:49:50,757 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 541150 2023-11-26 22:50:03,745 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3607720.0, ans=0.125 2023-11-26 22:50:20,083 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.136e+01 9.821e+01 1.049e+02 1.148e+02 1.594e+02, threshold=2.098e+02, percent-clipped=0.0 2023-11-26 22:50:23,445 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3607853.3333333335, ans=0.125 2023-11-26 22:50:24,342 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 100, loss[loss=0.08548, simple_loss=0.1079, pruned_loss=0.0188, audio_tagging_loss=0.01273, over 14462.00 frames. ], tot_loss[loss=0.07133, simple_loss=0.08862, pruned_loss=0.01132, audio_tagging_loss=0.0157, over 1213395.51 frames. ], batch size: 53, lr: 1.48e-03, grad_scale: 32.0 2023-11-26 22:50:35,701 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3607920.0, ans=0.2 2023-11-26 22:50:40,935 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3607920.0, ans=0.125 2023-11-26 22:50:47,217 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 541200 2023-11-26 22:50:49,725 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3607986.6666666665, ans=0.0 2023-11-26 22:50:53,178 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.03 vs. limit=15.0 2023-11-26 22:50:54,938 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3607986.6666666665, ans=0.035 2023-11-26 22:51:10,512 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3608120.0, ans=0.0 2023-11-26 22:51:20,223 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 150, loss[loss=0.08281, simple_loss=0.1056, pruned_loss=0.01874, audio_tagging_loss=0.01124, over 14454.00 frames. ], tot_loss[loss=0.06974, simple_loss=0.08787, pruned_loss=0.01153, audio_tagging_loss=0.01428, over 1616607.71 frames. ], batch size: 57, lr: 1.48e-03, grad_scale: 32.0 2023-11-26 22:51:26,964 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3608186.6666666665, ans=0.0 2023-11-26 22:51:38,025 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.37 vs. limit=12.0 2023-11-26 22:51:43,330 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 541250 2023-11-26 22:52:02,158 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3608386.6666666665, ans=0.1 2023-11-26 22:52:04,829 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3608453.3333333335, ans=0.2 2023-11-26 22:52:12,648 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.648e+01 9.369e+01 9.812e+01 1.037e+02 1.267e+02, threshold=1.962e+02, percent-clipped=0.0 2023-11-26 22:52:13,893 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff2.min_abs, batch_count=3608453.3333333335, ans=0.1 2023-11-26 22:52:16,851 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 200, loss[loss=0.09269, simple_loss=0.1328, pruned_loss=0.01981, audio_tagging_loss=0.006499, over 15055.00 frames. ], tot_loss[loss=0.06866, simple_loss=0.08862, pruned_loss=0.01162, audio_tagging_loss=0.01273, over 1932562.30 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 22:52:18,108 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3608520.0, ans=0.2 2023-11-26 22:52:38,622 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 541300 2023-11-26 22:52:45,571 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3608653.3333333335, ans=0.1 2023-11-26 22:52:59,503 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3608720.0, ans=0.125 2023-11-26 22:53:10,765 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3608786.6666666665, ans=0.1 2023-11-26 22:53:11,784 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3608853.3333333335, ans=0.125 2023-11-26 22:53:12,705 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 250, loss[loss=0.05603, simple_loss=0.07776, pruned_loss=0.008787, audio_tagging_loss=0.008366, over 15524.00 frames. ], tot_loss[loss=0.06752, simple_loss=0.08874, pruned_loss=0.01163, audio_tagging_loss=0.01151, over 2175482.61 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 22:53:24,760 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3608920.0, ans=0.125 2023-11-26 22:53:35,457 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 541350 2023-11-26 22:53:35,664 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3608986.6666666665, ans=0.07 2023-11-26 22:53:43,662 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3608986.6666666665, ans=0.0 2023-11-26 22:53:56,022 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3609053.3333333335, ans=0.0 2023-11-26 22:54:01,290 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3609120.0, ans=0.125 2023-11-26 22:54:05,262 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.789e+01 9.060e+01 9.556e+01 1.038e+02 1.375e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-26 22:54:09,071 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 300, loss[loss=0.07353, simple_loss=0.1032, pruned_loss=0.01555, audio_tagging_loss=0.006366, over 16345.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.08841, pruned_loss=0.01165, audio_tagging_loss=0.01069, over 2374124.51 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 22:54:11,318 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3609186.6666666665, ans=0.125 2023-11-26 22:54:18,242 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3609186.6666666665, ans=0.0 2023-11-26 22:54:32,034 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 541400 2023-11-26 22:54:42,974 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3609386.6666666665, ans=0.125 2023-11-26 22:55:04,953 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 350, loss[loss=0.06553, simple_loss=0.08832, pruned_loss=0.0125, audio_tagging_loss=0.008869, over 15170.00 frames. ], tot_loss[loss=0.06676, simple_loss=0.08957, pruned_loss=0.01181, audio_tagging_loss=0.01017, over 2530118.62 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 22:55:10,968 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3609520.0, ans=0.125 2023-11-26 22:55:17,049 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3609586.6666666665, ans=0.0 2023-11-26 22:55:18,000 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3609586.6666666665, ans=0.5 2023-11-26 22:55:27,484 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 541450 2023-11-26 22:55:45,537 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3609720.0, ans=0.0 2023-11-26 22:55:57,956 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.483e+01 8.928e+01 9.545e+01 1.017e+02 1.635e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-26 22:56:01,281 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 400, loss[loss=0.08531, simple_loss=0.1226, pruned_loss=0.01538, audio_tagging_loss=0.008635, over 15796.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.08942, pruned_loss=0.01192, audio_tagging_loss=0.009855, over 2643160.58 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 22:56:07,921 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3609853.3333333335, ans=0.125 2023-11-26 22:56:10,188 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3609853.3333333335, ans=0.0 2023-11-26 22:56:23,738 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 541500 2023-11-26 22:56:42,918 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3610053.3333333335, ans=0.0 2023-11-26 22:56:50,238 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3610120.0, ans=0.125 2023-11-26 22:56:56,551 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 450, loss[loss=0.04559, simple_loss=0.05773, pruned_loss=0.00588, audio_tagging_loss=0.01085, over 14520.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.08867, pruned_loss=0.01183, audio_tagging_loss=0.009613, over 2728294.88 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 22:57:18,588 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3610320.0, ans=0.0 2023-11-26 22:57:20,020 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 541550 2023-11-26 22:57:27,840 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.23 vs. limit=12.0 2023-11-26 22:57:43,388 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.99 vs. limit=12.0 2023-11-26 22:57:49,768 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.752e+01 9.008e+01 9.621e+01 1.046e+02 1.513e+02, threshold=1.924e+02, percent-clipped=0.0 2023-11-26 22:57:53,050 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 500, loss[loss=0.05786, simple_loss=0.07287, pruned_loss=0.009518, audio_tagging_loss=0.01191, over 15270.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.0892, pruned_loss=0.01195, audio_tagging_loss=0.009435, over 2798638.29 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 22:58:01,182 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3610520.0, ans=0.2 2023-11-26 22:58:15,239 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 541600 2023-11-26 22:58:47,563 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3610786.6666666665, ans=0.2 2023-11-26 22:58:48,733 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3610853.3333333335, ans=0.0 2023-11-26 22:58:49,535 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 550, loss[loss=0.07734, simple_loss=0.1084, pruned_loss=0.01302, audio_tagging_loss=0.01013, over 16113.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08905, pruned_loss=0.01197, audio_tagging_loss=0.009271, over 2856412.14 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 22:58:54,267 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.55 vs. limit=12.0 2023-11-26 22:59:03,398 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3610920.0, ans=0.0 2023-11-26 22:59:11,749 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 541650 2023-11-26 22:59:20,340 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3610986.6666666665, ans=0.0 2023-11-26 22:59:22,673 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3611053.3333333335, ans=0.125 2023-11-26 22:59:31,031 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3611053.3333333335, ans=0.1 2023-11-26 22:59:41,872 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3611120.0, ans=0.1 2023-11-26 22:59:42,595 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.191e+01 8.846e+01 9.307e+01 1.018e+02 1.266e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-26 22:59:44,782 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 600, loss[loss=0.05717, simple_loss=0.07718, pruned_loss=0.0094, audio_tagging_loss=0.009176, over 15186.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08877, pruned_loss=0.01188, audio_tagging_loss=0.009079, over 2899833.90 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 22:59:46,115 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3611186.6666666665, ans=0.125 2023-11-26 22:59:57,893 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3611253.3333333335, ans=0.125 2023-11-26 23:00:00,068 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3611253.3333333335, ans=0.1 2023-11-26 23:00:01,025 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3611253.3333333335, ans=0.125 2023-11-26 23:00:07,202 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 541700 2023-11-26 23:00:09,904 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3611320.0, ans=0.125 2023-11-26 23:00:13,837 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3611320.0, ans=0.0 2023-11-26 23:00:17,029 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3611320.0, ans=0.125 2023-11-26 23:00:23,360 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3611386.6666666665, ans=0.1 2023-11-26 23:00:24,493 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3611386.6666666665, ans=0.125 2023-11-26 23:00:31,306 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3611453.3333333335, ans=0.1 2023-11-26 23:00:41,241 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 650, loss[loss=0.06284, simple_loss=0.0836, pruned_loss=0.01084, audio_tagging_loss=0.0102, over 15692.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08855, pruned_loss=0.01198, audio_tagging_loss=0.009066, over 2934377.43 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:00:44,754 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3611520.0, ans=0.025 2023-11-26 23:00:52,743 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3611586.6666666665, ans=0.0 2023-11-26 23:01:03,886 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 541750 2023-11-26 23:01:31,847 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3611786.6666666665, ans=0.125 2023-11-26 23:01:35,261 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.404e+01 8.767e+01 9.555e+01 1.054e+02 1.204e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-26 23:01:37,483 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 700, loss[loss=0.07541, simple_loss=0.1061, pruned_loss=0.01361, audio_tagging_loss=0.008746, over 14243.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08865, pruned_loss=0.01184, audio_tagging_loss=0.009017, over 2961903.85 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:01:45,693 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3611853.3333333335, ans=0.0 2023-11-26 23:01:59,843 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 541800 2023-11-26 23:02:08,385 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3611986.6666666665, ans=0.125 2023-11-26 23:02:11,007 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.70 vs. limit=10.0 2023-11-26 23:02:33,560 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 750, loss[loss=0.07071, simple_loss=0.09822, pruned_loss=0.01447, audio_tagging_loss=0.007123, over 16183.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.08976, pruned_loss=0.01212, audio_tagging_loss=0.00894, over 2989408.54 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:02:50,873 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3612253.3333333335, ans=0.0 2023-11-26 23:02:55,948 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 541850 2023-11-26 23:02:57,119 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3612320.0, ans=0.125 2023-11-26 23:03:27,825 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.932e+01 9.015e+01 9.591e+01 1.028e+02 1.389e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-26 23:03:29,966 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 800, loss[loss=0.09971, simple_loss=0.1389, pruned_loss=0.02446, audio_tagging_loss=0.005801, over 15722.00 frames. ], tot_loss[loss=0.0667, simple_loss=0.09084, pruned_loss=0.01235, audio_tagging_loss=0.008939, over 3009246.43 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:03:33,323 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3612520.0, ans=0.125 2023-11-26 23:03:33,443 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3612520.0, ans=0.07 2023-11-26 23:03:42,468 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.16 vs. limit=15.0 2023-11-26 23:03:52,665 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 541900 2023-11-26 23:03:58,093 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3612653.3333333335, ans=0.1 2023-11-26 23:03:59,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3612653.3333333335, ans=0.1 2023-11-26 23:04:07,006 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3612720.0, ans=0.125 2023-11-26 23:04:11,752 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3612720.0, ans=0.125 2023-11-26 23:04:14,836 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3612786.6666666665, ans=0.125 2023-11-26 23:04:14,946 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3612786.6666666665, ans=0.07 2023-11-26 23:04:21,282 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3612786.6666666665, ans=0.125 2023-11-26 23:04:25,727 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 850, loss[loss=0.06228, simple_loss=0.074, pruned_loss=0.01358, audio_tagging_loss=0.0117, over 14512.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.09046, pruned_loss=0.01218, audio_tagging_loss=0.008918, over 3021755.18 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:04:48,342 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 541950 2023-11-26 23:05:04,084 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3613053.3333333335, ans=0.0 2023-11-26 23:05:16,693 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.10 vs. limit=15.0 2023-11-26 23:05:19,276 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.559e+01 8.857e+01 9.463e+01 1.019e+02 1.516e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-26 23:05:21,995 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 900, loss[loss=0.0518, simple_loss=0.07183, pruned_loss=0.006524, audio_tagging_loss=0.009359, over 15158.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.09067, pruned_loss=0.01218, audio_tagging_loss=0.008912, over 3032675.97 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:05:25,360 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3613186.6666666665, ans=0.125 2023-11-26 23:05:32,328 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3613253.3333333335, ans=0.05 2023-11-26 23:05:42,437 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3613253.3333333335, ans=0.0 2023-11-26 23:05:43,576 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3613320.0, ans=0.125 2023-11-26 23:05:44,498 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 542000 2023-11-26 23:06:12,081 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3613453.3333333335, ans=0.125 2023-11-26 23:06:18,758 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 950, loss[loss=0.05023, simple_loss=0.06681, pruned_loss=0.007499, audio_tagging_loss=0.009331, over 15931.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.091, pruned_loss=0.0122, audio_tagging_loss=0.008856, over 3043956.03 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:06:40,878 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 542050 2023-11-26 23:06:41,300 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.55 vs. limit=12.0 2023-11-26 23:06:42,369 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.94 vs. limit=15.0 2023-11-26 23:06:57,939 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3613720.0, ans=0.0 2023-11-26 23:07:03,889 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3613786.6666666665, ans=0.2 2023-11-26 23:07:03,904 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3613786.6666666665, ans=0.0 2023-11-26 23:07:05,635 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.39 vs. limit=5.0 2023-11-26 23:07:11,343 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3613786.6666666665, ans=0.2 2023-11-26 23:07:13,290 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.799e+01 8.708e+01 9.328e+01 1.031e+02 1.282e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-26 23:07:13,589 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3613853.3333333335, ans=0.0 2023-11-26 23:07:14,453 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 1000, loss[loss=0.0693, simple_loss=0.09853, pruned_loss=0.01302, audio_tagging_loss=0.007022, over 15805.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.09036, pruned_loss=0.01205, audio_tagging_loss=0.008709, over 3043365.34 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:07:15,259 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=11.58 vs. limit=15.0 2023-11-26 23:07:15,760 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3613853.3333333335, ans=0.0 2023-11-26 23:07:24,731 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3613920.0, ans=0.1 2023-11-26 23:07:26,375 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.01 vs. limit=15.0 2023-11-26 23:07:33,155 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.99 vs. limit=15.0 2023-11-26 23:07:37,451 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 542100 2023-11-26 23:07:38,447 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 23:07:47,614 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3614053.3333333335, ans=0.0 2023-11-26 23:07:47,658 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 23:07:55,565 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.78 vs. limit=10.0 2023-11-26 23:08:10,480 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 1050, loss[loss=0.06215, simple_loss=0.08528, pruned_loss=0.01139, audio_tagging_loss=0.008121, over 14518.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08929, pruned_loss=0.012, audio_tagging_loss=0.008713, over 3041545.72 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:08:28,457 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3614253.3333333335, ans=0.2 2023-11-26 23:08:33,555 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 542150 2023-11-26 23:08:57,719 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3614453.3333333335, ans=0.125 2023-11-26 23:08:58,861 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3614453.3333333335, ans=0.0 2023-11-26 23:09:05,470 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.666e+01 8.633e+01 9.310e+01 1.034e+02 1.583e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-26 23:09:06,534 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 1100, loss[loss=0.07013, simple_loss=0.09551, pruned_loss=0.01294, audio_tagging_loss=0.009439, over 14664.00 frames. ], tot_loss[loss=0.06493, simple_loss=0.08861, pruned_loss=0.01193, audio_tagging_loss=0.008695, over 3044690.06 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:09:09,287 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 23:09:12,912 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.41 vs. limit=15.0 2023-11-26 23:09:15,808 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 23:09:15,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3614520.0, ans=0.09899494936611666 2023-11-26 23:09:24,321 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.15 vs. limit=22.5 2023-11-26 23:09:29,055 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 542200 2023-11-26 23:09:31,458 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3614653.3333333335, ans=0.125 2023-11-26 23:09:46,577 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3614720.0, ans=0.125 2023-11-26 23:09:54,784 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.67 vs. limit=15.0 2023-11-26 23:10:02,937 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 1150, loss[loss=0.05215, simple_loss=0.0746, pruned_loss=0.005406, audio_tagging_loss=0.009442, over 15531.00 frames. ], tot_loss[loss=0.06484, simple_loss=0.08874, pruned_loss=0.01182, audio_tagging_loss=0.008651, over 3050744.85 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:10:07,089 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=3614853.3333333335, ans=15.0 2023-11-26 23:10:12,247 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3614853.3333333335, ans=0.125 2023-11-26 23:10:13,366 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3614920.0, ans=0.125 2023-11-26 23:10:25,019 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 542250 2023-11-26 23:10:42,903 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3615053.3333333335, ans=0.1 2023-11-26 23:10:46,032 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3615053.3333333335, ans=0.125 2023-11-26 23:10:46,289 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.09 vs. limit=15.0 2023-11-26 23:10:57,960 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.595e+01 8.828e+01 9.396e+01 9.982e+01 1.339e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-26 23:10:58,293 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3615186.6666666665, ans=0.125 2023-11-26 23:10:59,048 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 1200, loss[loss=0.06993, simple_loss=0.09497, pruned_loss=0.01292, audio_tagging_loss=0.009522, over 14944.00 frames. ], tot_loss[loss=0.06502, simple_loss=0.08916, pruned_loss=0.01189, audio_tagging_loss=0.008549, over 3049109.58 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:10:59,853 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.27 vs. limit=15.0 2023-11-26 23:11:04,622 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3615186.6666666665, ans=0.1 2023-11-26 23:11:06,693 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3615186.6666666665, ans=0.1 2023-11-26 23:11:21,960 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 542300 2023-11-26 23:11:22,144 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3615320.0, ans=0.0 2023-11-26 23:11:25,337 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3615320.0, ans=0.125 2023-11-26 23:11:37,544 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3615386.6666666665, ans=0.0 2023-11-26 23:11:48,062 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3615453.3333333335, ans=0.125 2023-11-26 23:11:48,171 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3615453.3333333335, ans=0.0 2023-11-26 23:11:54,921 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 1250, loss[loss=0.05228, simple_loss=0.07169, pruned_loss=0.007129, audio_tagging_loss=0.009303, over 15637.00 frames. ], tot_loss[loss=0.06493, simple_loss=0.089, pruned_loss=0.01193, audio_tagging_loss=0.008504, over 3048725.50 frames. ], batch size: 61, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:12:06,310 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3615586.6666666665, ans=0.125 2023-11-26 23:12:17,453 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 542350 2023-11-26 23:12:26,133 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3615653.3333333335, ans=0.125 2023-11-26 23:12:26,213 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3615653.3333333335, ans=0.0 2023-11-26 23:12:34,165 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3615720.0, ans=0.0 2023-11-26 23:12:37,485 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3615720.0, ans=0.1 2023-11-26 23:12:43,202 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.40 vs. limit=15.0 2023-11-26 23:12:49,582 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.75 vs. limit=15.0 2023-11-26 23:12:49,946 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.362e+01 8.892e+01 9.390e+01 1.002e+02 1.440e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-26 23:12:51,052 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 1300, loss[loss=0.07187, simple_loss=0.103, pruned_loss=0.01326, audio_tagging_loss=0.007107, over 16262.00 frames. ], tot_loss[loss=0.06472, simple_loss=0.08858, pruned_loss=0.0119, audio_tagging_loss=0.008531, over 3050827.63 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:12:53,522 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3615853.3333333335, ans=0.125 2023-11-26 23:13:01,335 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3615920.0, ans=0.125 2023-11-26 23:13:07,815 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3615920.0, ans=0.125 2023-11-26 23:13:12,839 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 542400 2023-11-26 23:13:18,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3615986.6666666665, ans=0.125 2023-11-26 23:13:24,037 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3616053.3333333335, ans=0.125 2023-11-26 23:13:34,025 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3616053.3333333335, ans=0.125 2023-11-26 23:13:47,146 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 1350, loss[loss=0.08767, simple_loss=0.1246, pruned_loss=0.017, audio_tagging_loss=0.008368, over 16214.00 frames. ], tot_loss[loss=0.06441, simple_loss=0.08803, pruned_loss=0.0118, audio_tagging_loss=0.008595, over 3051780.68 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:13:57,055 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.12 vs. limit=22.5 2023-11-26 23:14:08,417 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3616320.0, ans=0.0 2023-11-26 23:14:09,905 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 542450 2023-11-26 23:14:26,375 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 23:14:38,480 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.67 vs. limit=22.5 2023-11-26 23:14:41,708 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.060e+01 9.028e+01 9.621e+01 1.020e+02 1.402e+02, threshold=1.924e+02, percent-clipped=0.0 2023-11-26 23:14:42,811 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 1400, loss[loss=0.08628, simple_loss=0.1192, pruned_loss=0.02113, audio_tagging_loss=0.005562, over 15399.00 frames. ], tot_loss[loss=0.06468, simple_loss=0.08831, pruned_loss=0.01188, audio_tagging_loss=0.008645, over 3052147.24 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:14:43,090 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3616520.0, ans=0.0 2023-11-26 23:15:05,787 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 542500 2023-11-26 23:15:31,480 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3616786.6666666665, ans=0.125 2023-11-26 23:15:34,311 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.95 vs. limit=10.0 2023-11-26 23:15:39,430 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 1450, loss[loss=0.0704, simple_loss=0.0974, pruned_loss=0.01222, audio_tagging_loss=0.009481, over 14414.00 frames. ], tot_loss[loss=0.0648, simple_loss=0.08841, pruned_loss=0.01185, audio_tagging_loss=0.008743, over 3051166.08 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:15:39,700 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3616853.3333333335, ans=0.0 2023-11-26 23:15:41,771 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3616853.3333333335, ans=0.125 2023-11-26 23:15:43,940 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3616853.3333333335, ans=0.125 2023-11-26 23:15:45,004 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3616853.3333333335, ans=0.0 2023-11-26 23:15:53,070 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3616920.0, ans=0.125 2023-11-26 23:16:01,454 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 542550 2023-11-26 23:16:09,763 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3616986.6666666665, ans=0.125 2023-11-26 23:16:25,663 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=3617120.0, ans=0.025 2023-11-26 23:16:34,546 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.043e+01 9.029e+01 9.878e+01 1.085e+02 1.417e+02, threshold=1.976e+02, percent-clipped=0.0 2023-11-26 23:16:34,879 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3617186.6666666665, ans=0.0 2023-11-26 23:16:35,645 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 1500, loss[loss=0.04006, simple_loss=0.04663, pruned_loss=0.006239, audio_tagging_loss=0.0105, over 15303.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08892, pruned_loss=0.01212, audio_tagging_loss=0.008785, over 3048848.84 frames. ], batch size: 61, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:16:58,176 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 542600 2023-11-26 23:17:00,067 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.10 vs. limit=22.5 2023-11-26 23:17:01,733 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 23:17:19,233 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.21 vs. limit=22.5 2023-11-26 23:17:25,159 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3617453.3333333335, ans=0.04949747468305833 2023-11-26 23:17:31,310 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 1550, loss[loss=0.05889, simple_loss=0.08144, pruned_loss=0.008116, audio_tagging_loss=0.01005, over 15665.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08869, pruned_loss=0.01204, audio_tagging_loss=0.008918, over 3049615.97 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:17:35,179 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3617520.0, ans=0.0 2023-11-26 23:17:46,365 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3617586.6666666665, ans=0.1 2023-11-26 23:17:54,294 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 542650 2023-11-26 23:17:59,243 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3617653.3333333335, ans=0.125 2023-11-26 23:18:22,618 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3617786.6666666665, ans=0.125 2023-11-26 23:18:26,139 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.35 vs. limit=15.0 2023-11-26 23:18:26,570 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.455e+01 9.195e+01 9.800e+01 1.042e+02 1.280e+02, threshold=1.960e+02, percent-clipped=0.0 2023-11-26 23:18:27,648 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 1600, loss[loss=0.05827, simple_loss=0.08077, pruned_loss=0.009353, audio_tagging_loss=0.008532, over 15362.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.08991, pruned_loss=0.01236, audio_tagging_loss=0.00901, over 3054253.13 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:18:49,967 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 542700 2023-11-26 23:19:06,130 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3618053.3333333335, ans=0.125 2023-11-26 23:19:18,377 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3618120.0, ans=0.125 2023-11-26 23:19:18,394 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_na.min_abs, batch_count=3618120.0, ans=0.02 2023-11-26 23:19:24,000 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 1650, loss[loss=0.0658, simple_loss=0.09375, pruned_loss=0.01125, audio_tagging_loss=0.007675, over 15888.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.09002, pruned_loss=0.0123, audio_tagging_loss=0.008974, over 3048537.40 frames. ], batch size: 61, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:19:36,886 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3618253.3333333335, ans=0.125 2023-11-26 23:19:45,815 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 542750 2023-11-26 23:19:47,520 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3618320.0, ans=0.125 2023-11-26 23:19:53,993 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3618320.0, ans=0.125 2023-11-26 23:20:02,336 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.38 vs. limit=12.0 2023-11-26 23:20:03,616 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3618386.6666666665, ans=0.125 2023-11-26 23:20:05,783 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3618386.6666666665, ans=0.125 2023-11-26 23:20:19,457 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.128e+01 8.999e+01 9.528e+01 1.009e+02 1.834e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-26 23:20:19,495 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 1700, loss[loss=0.05686, simple_loss=0.07667, pruned_loss=0.009492, audio_tagging_loss=0.009037, over 16939.00 frames. ], tot_loss[loss=0.0663, simple_loss=0.08995, pruned_loss=0.01236, audio_tagging_loss=0.008963, over 3048968.99 frames. ], batch size: 63, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:20:24,471 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3618520.0, ans=0.07 2023-11-26 23:20:41,988 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 542800 2023-11-26 23:20:50,420 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3618653.3333333335, ans=0.125 2023-11-26 23:21:12,706 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3618786.6666666665, ans=0.125 2023-11-26 23:21:14,800 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3618853.3333333335, ans=0.1 2023-11-26 23:21:15,670 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 1750, loss[loss=0.05992, simple_loss=0.08761, pruned_loss=0.008562, audio_tagging_loss=0.007548, over 15191.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.08974, pruned_loss=0.01226, audio_tagging_loss=0.008917, over 3048921.74 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:21:17,956 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3618853.3333333335, ans=0.125 2023-11-26 23:21:32,958 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3618920.0, ans=0.2 2023-11-26 23:21:38,136 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 542850 2023-11-26 23:21:42,027 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.81 vs. limit=10.0 2023-11-26 23:21:42,591 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3618986.6666666665, ans=0.0 2023-11-26 23:22:11,433 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.723e+01 8.875e+01 9.667e+01 1.011e+02 1.829e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-26 23:22:11,473 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 1800, loss[loss=0.05904, simple_loss=0.08354, pruned_loss=0.009141, audio_tagging_loss=0.00813, over 14643.00 frames. ], tot_loss[loss=0.06646, simple_loss=0.09056, pruned_loss=0.01241, audio_tagging_loss=0.008765, over 3048190.14 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:22:24,034 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3619253.3333333335, ans=0.0 2023-11-26 23:22:24,153 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3619253.3333333335, ans=0.1 2023-11-26 23:22:34,090 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 542900 2023-11-26 23:22:38,000 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3619320.0, ans=0.125 2023-11-26 23:22:40,598 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.00 vs. limit=15.0 2023-11-26 23:22:46,554 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3619386.6666666665, ans=0.125 2023-11-26 23:22:50,057 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.78 vs. limit=15.0 2023-11-26 23:22:59,521 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3619453.3333333335, ans=0.125 2023-11-26 23:23:07,871 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 1850, loss[loss=0.05521, simple_loss=0.07297, pruned_loss=0.01028, audio_tagging_loss=0.008442, over 14723.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.09063, pruned_loss=0.01234, audio_tagging_loss=0.008654, over 3052699.56 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:23:09,254 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3619520.0, ans=0.1 2023-11-26 23:23:19,320 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3619586.6666666665, ans=0.95 2023-11-26 23:23:30,134 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 542950 2023-11-26 23:23:39,399 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3619653.3333333335, ans=0.125 2023-11-26 23:23:44,260 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3619720.0, ans=0.125 2023-11-26 23:23:56,357 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3619786.6666666665, ans=0.125 2023-11-26 23:24:04,164 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 1900, loss[loss=0.05928, simple_loss=0.08234, pruned_loss=0.009788, audio_tagging_loss=0.008326, over 14964.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.08932, pruned_loss=0.0121, audio_tagging_loss=0.00865, over 3051314.06 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 8.0 2023-11-26 23:24:05,239 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.061e+01 9.189e+01 9.752e+01 1.031e+02 1.213e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-26 23:24:10,877 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3619853.3333333335, ans=0.0 2023-11-26 23:24:18,948 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3619920.0, ans=0.0 2023-11-26 23:24:26,786 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 543000 2023-11-26 23:24:48,363 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3620120.0, ans=0.0 2023-11-26 23:24:50,525 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3620120.0, ans=0.95 2023-11-26 23:24:52,966 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.46 vs. limit=15.0 2023-11-26 23:24:59,808 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 1950, loss[loss=0.06627, simple_loss=0.09059, pruned_loss=0.01276, audio_tagging_loss=0.008215, over 15593.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.08939, pruned_loss=0.01199, audio_tagging_loss=0.008644, over 3048434.36 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 8.0 2023-11-26 23:25:10,552 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3620253.3333333335, ans=0.0 2023-11-26 23:25:10,709 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3620253.3333333335, ans=0.125 2023-11-26 23:25:19,465 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.80 vs. limit=15.0 2023-11-26 23:25:19,564 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.06 vs. limit=15.0 2023-11-26 23:25:22,770 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 543050 2023-11-26 23:25:24,065 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3620320.0, ans=0.125 2023-11-26 23:25:30,962 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3620320.0, ans=0.0 2023-11-26 23:25:36,417 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3620386.6666666665, ans=0.125 2023-11-26 23:25:38,461 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3620386.6666666665, ans=0.125 2023-11-26 23:25:56,326 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 2000, loss[loss=0.06562, simple_loss=0.08587, pruned_loss=0.01089, audio_tagging_loss=0.0118, over 15970.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08965, pruned_loss=0.0121, audio_tagging_loss=0.00877, over 3050739.76 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:25:57,375 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.406e+01 8.817e+01 9.525e+01 1.016e+02 1.209e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-26 23:25:59,148 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.06 vs. limit=15.0 2023-11-26 23:26:02,866 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3620520.0, ans=0.125 2023-11-26 23:26:07,660 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3620586.6666666665, ans=0.0 2023-11-26 23:26:10,397 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.25 vs. limit=10.0 2023-11-26 23:26:12,496 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3620586.6666666665, ans=0.0 2023-11-26 23:26:18,574 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 543100 2023-11-26 23:26:24,106 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3620653.3333333335, ans=0.125 2023-11-26 23:26:52,270 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 2050, loss[loss=0.05257, simple_loss=0.05932, pruned_loss=0.01078, audio_tagging_loss=0.01213, over 15842.00 frames. ], tot_loss[loss=0.06645, simple_loss=0.09077, pruned_loss=0.01237, audio_tagging_loss=0.008689, over 3048225.71 frames. ], batch size: 61, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:27:01,487 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3620853.3333333335, ans=0.125 2023-11-26 23:27:14,800 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 543150 2023-11-26 23:27:24,078 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3620986.6666666665, ans=0.1 2023-11-26 23:27:32,006 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3621053.3333333335, ans=0.125 2023-11-26 23:27:42,366 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.68 vs. limit=15.0 2023-11-26 23:27:48,127 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 2100, loss[loss=0.07436, simple_loss=0.08625, pruned_loss=0.01346, audio_tagging_loss=0.01777, over 13919.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08948, pruned_loss=0.01205, audio_tagging_loss=0.008645, over 3041321.30 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 8.0 2023-11-26 23:27:50,227 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.517e+01 8.873e+01 9.430e+01 1.020e+02 1.802e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-26 23:28:00,777 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.79 vs. limit=10.0 2023-11-26 23:28:05,039 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.87 vs. limit=6.0 2023-11-26 23:28:10,385 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 543200 2023-11-26 23:28:10,483 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3621320.0, ans=0.0 2023-11-26 23:28:14,625 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3621320.0, ans=0.125 2023-11-26 23:28:33,451 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3621453.3333333335, ans=0.125 2023-11-26 23:28:44,526 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 2150, loss[loss=0.07335, simple_loss=0.0998, pruned_loss=0.01672, audio_tagging_loss=0.006731, over 15794.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.08932, pruned_loss=0.01207, audio_tagging_loss=0.00846, over 3035338.87 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 8.0 2023-11-26 23:28:51,598 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 23:28:58,912 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.46 vs. limit=22.5 2023-11-26 23:29:00,780 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3621586.6666666665, ans=0.0 2023-11-26 23:29:07,092 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.30 vs. limit=15.0 2023-11-26 23:29:07,515 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 543250 2023-11-26 23:29:17,490 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 23:29:17,777 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3621720.0, ans=0.0 2023-11-26 23:29:25,093 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3621720.0, ans=0.0 2023-11-26 23:29:40,102 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3621853.3333333335, ans=0.1 2023-11-26 23:29:41,012 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 2200, loss[loss=0.0529, simple_loss=0.08358, pruned_loss=0.005088, audio_tagging_loss=0.006023, over 16686.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.08939, pruned_loss=0.01198, audio_tagging_loss=0.008492, over 3040107.76 frames. ], batch size: 61, lr: 1.47e-03, grad_scale: 8.0 2023-11-26 23:29:43,113 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.448e+01 8.935e+01 9.696e+01 1.032e+02 1.602e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-26 23:30:03,484 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 543300 2023-11-26 23:30:10,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3621986.6666666665, ans=0.04949747468305833 2023-11-26 23:30:11,032 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3621986.6666666665, ans=0.2 2023-11-26 23:30:20,165 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3622053.3333333335, ans=0.125 2023-11-26 23:30:36,864 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 2250, loss[loss=0.05213, simple_loss=0.06816, pruned_loss=0.009713, audio_tagging_loss=0.008336, over 14405.00 frames. ], tot_loss[loss=0.06473, simple_loss=0.08881, pruned_loss=0.01178, audio_tagging_loss=0.008546, over 3041670.81 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 8.0 2023-11-26 23:30:47,204 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3622253.3333333335, ans=0.125 2023-11-26 23:30:49,300 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3622253.3333333335, ans=0.125 2023-11-26 23:30:58,754 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 543350 2023-11-26 23:31:11,910 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.90 vs. limit=15.0 2023-11-26 23:31:22,350 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3622453.3333333335, ans=0.1 2023-11-26 23:31:32,012 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 2300, loss[loss=0.07325, simple_loss=0.09969, pruned_loss=0.01505, audio_tagging_loss=0.008356, over 16862.00 frames. ], tot_loss[loss=0.06448, simple_loss=0.08778, pruned_loss=0.01189, audio_tagging_loss=0.008701, over 3039266.33 frames. ], batch size: 62, lr: 1.47e-03, grad_scale: 8.0 2023-11-26 23:31:33,304 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3622520.0, ans=0.1 2023-11-26 23:31:34,118 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.834e+01 8.796e+01 9.547e+01 1.006e+02 1.160e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-26 23:31:35,466 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3622520.0, ans=0.1 2023-11-26 23:31:36,505 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3622520.0, ans=0.0 2023-11-26 23:31:39,774 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3622520.0, ans=0.125 2023-11-26 23:31:41,826 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3622586.6666666665, ans=0.125 2023-11-26 23:31:47,070 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3622586.6666666665, ans=0.035 2023-11-26 23:31:47,508 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.24 vs. limit=10.0 2023-11-26 23:31:55,050 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 543400 2023-11-26 23:32:07,717 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 23:32:15,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_na.min_abs, batch_count=3622720.0, ans=0.02 2023-11-26 23:32:18,389 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 23:32:20,292 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 23:32:27,717 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 2350, loss[loss=0.08082, simple_loss=0.1118, pruned_loss=0.01848, audio_tagging_loss=0.006441, over 15223.00 frames. ], tot_loss[loss=0.06505, simple_loss=0.08848, pruned_loss=0.012, audio_tagging_loss=0.008808, over 3045453.28 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 8.0 2023-11-26 23:32:31,309 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3622853.3333333335, ans=0.1 2023-11-26 23:32:36,087 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3622853.3333333335, ans=0.125 2023-11-26 23:32:41,596 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3622920.0, ans=0.0 2023-11-26 23:32:51,413 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 543450 2023-11-26 23:33:25,279 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 2400, loss[loss=0.09238, simple_loss=0.1232, pruned_loss=0.0228, audio_tagging_loss=0.007979, over 14989.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.0891, pruned_loss=0.01218, audio_tagging_loss=0.008848, over 3045772.63 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:33:27,451 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.661e+01 8.979e+01 9.586e+01 1.037e+02 1.629e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-26 23:33:28,837 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3623186.6666666665, ans=0.0 2023-11-26 23:33:29,878 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3623186.6666666665, ans=0.0 2023-11-26 23:33:34,719 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3623186.6666666665, ans=0.07 2023-11-26 23:33:47,462 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 543500 2023-11-26 23:34:21,516 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 2450, loss[loss=0.05618, simple_loss=0.07402, pruned_loss=0.01089, audio_tagging_loss=0.008278, over 16364.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.08886, pruned_loss=0.01212, audio_tagging_loss=0.008832, over 3039005.30 frames. ], batch size: 62, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:34:34,812 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3623586.6666666665, ans=10.0 2023-11-26 23:34:35,851 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3623586.6666666665, ans=0.2 2023-11-26 23:34:39,534 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3623586.6666666665, ans=0.1 2023-11-26 23:34:44,214 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 543550 2023-11-26 23:34:54,349 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3623720.0, ans=0.0 2023-11-26 23:34:59,246 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.81 vs. limit=15.0 2023-11-26 23:35:04,632 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.24 vs. limit=15.0 2023-11-26 23:35:06,417 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3623786.6666666665, ans=0.1 2023-11-26 23:35:16,741 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 2500, loss[loss=0.06576, simple_loss=0.09009, pruned_loss=0.01288, audio_tagging_loss=0.007836, over 14796.00 frames. ], tot_loss[loss=0.06498, simple_loss=0.08853, pruned_loss=0.01189, audio_tagging_loss=0.008829, over 3041087.46 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:35:16,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3623853.3333333335, ans=0.125 2023-11-26 23:35:18,813 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.712e+01 8.886e+01 9.376e+01 1.002e+02 1.338e+02, threshold=1.875e+02, percent-clipped=0.0 2023-11-26 23:35:31,304 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3623920.0, ans=0.2 2023-11-26 23:35:40,222 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 543600 2023-11-26 23:35:44,050 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3623986.6666666665, ans=0.125 2023-11-26 23:35:45,400 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.03 vs. limit=6.0 2023-11-26 23:36:14,148 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 2550, loss[loss=0.05794, simple_loss=0.07825, pruned_loss=0.009034, audio_tagging_loss=0.009782, over 15891.00 frames. ], tot_loss[loss=0.06483, simple_loss=0.08824, pruned_loss=0.01189, audio_tagging_loss=0.008812, over 3041115.08 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:36:24,590 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3624253.3333333335, ans=0.125 2023-11-26 23:36:33,015 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3624253.3333333335, ans=0.1 2023-11-26 23:36:36,164 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 543650 2023-11-26 23:36:36,292 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3624320.0, ans=0.125 2023-11-26 23:36:42,458 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3624320.0, ans=0.125 2023-11-26 23:37:02,113 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3624453.3333333335, ans=0.0 2023-11-26 23:37:04,355 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3624453.3333333335, ans=0.1 2023-11-26 23:37:09,918 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 2600, loss[loss=0.06173, simple_loss=0.08165, pruned_loss=0.01215, audio_tagging_loss=0.008763, over 15397.00 frames. ], tot_loss[loss=0.06475, simple_loss=0.08864, pruned_loss=0.0119, audio_tagging_loss=0.00854, over 3041348.16 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:37:11,965 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.383e+01 8.743e+01 9.424e+01 1.014e+02 1.712e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-26 23:37:12,149 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3624520.0, ans=0.125 2023-11-26 23:37:31,797 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 543700 2023-11-26 23:37:38,773 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3624653.3333333335, ans=0.0 2023-11-26 23:38:05,154 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 2650, loss[loss=0.06197, simple_loss=0.07836, pruned_loss=0.01222, audio_tagging_loss=0.01057, over 13785.00 frames. ], tot_loss[loss=0.06477, simple_loss=0.08853, pruned_loss=0.01192, audio_tagging_loss=0.008579, over 3037835.75 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:38:18,762 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.14 vs. limit=22.5 2023-11-26 23:38:19,741 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.19 vs. limit=22.5 2023-11-26 23:38:28,284 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 543750 2023-11-26 23:38:51,242 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3625120.0, ans=0.025 2023-11-26 23:39:01,795 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 2700, loss[loss=0.05837, simple_loss=0.08367, pruned_loss=0.00875, audio_tagging_loss=0.007789, over 16452.00 frames. ], tot_loss[loss=0.06503, simple_loss=0.08906, pruned_loss=0.01197, audio_tagging_loss=0.00853, over 3042129.33 frames. ], batch size: 61, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:39:03,858 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.301e+01 8.924e+01 9.565e+01 1.006e+02 1.395e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-26 23:39:24,312 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 543800 2023-11-26 23:39:26,174 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.59 vs. limit=22.5 2023-11-26 23:39:42,641 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3625386.6666666665, ans=0.125 2023-11-26 23:39:50,033 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3625453.3333333335, ans=0.0 2023-11-26 23:39:58,533 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 2750, loss[loss=0.0714, simple_loss=0.1008, pruned_loss=0.01477, audio_tagging_loss=0.006216, over 14641.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08955, pruned_loss=0.01213, audio_tagging_loss=0.008469, over 3043469.70 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:40:03,045 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3625520.0, ans=0.125 2023-11-26 23:40:09,266 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3625586.6666666665, ans=0.0 2023-11-26 23:40:20,214 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 543850 2023-11-26 23:40:34,721 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3625720.0, ans=0.5 2023-11-26 23:40:34,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3625720.0, ans=0.1 2023-11-26 23:40:39,587 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3625720.0, ans=0.0 2023-11-26 23:40:42,720 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3625786.6666666665, ans=0.125 2023-11-26 23:40:42,828 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3625786.6666666665, ans=0.125 2023-11-26 23:40:44,678 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 23:40:53,062 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 2800, loss[loss=0.06316, simple_loss=0.09087, pruned_loss=0.01164, audio_tagging_loss=0.006082, over 16599.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08946, pruned_loss=0.01202, audio_tagging_loss=0.008486, over 3039236.81 frames. ], batch size: 63, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:40:55,216 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.871e+01 8.947e+01 9.554e+01 1.028e+02 1.223e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-26 23:40:56,423 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3625853.3333333335, ans=0.125 2023-11-26 23:40:57,901 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.38 vs. limit=15.0 2023-11-26 23:41:09,192 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3625920.0, ans=0.025 2023-11-26 23:41:11,297 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3625920.0, ans=0.0 2023-11-26 23:41:15,463 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 543900 2023-11-26 23:41:40,103 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3626120.0, ans=0.125 2023-11-26 23:41:49,638 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 2850, loss[loss=0.06785, simple_loss=0.09684, pruned_loss=0.008843, audio_tagging_loss=0.01059, over 17026.00 frames. ], tot_loss[loss=0.06495, simple_loss=0.08898, pruned_loss=0.01196, audio_tagging_loss=0.008497, over 3038496.36 frames. ], batch size: 62, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:41:55,166 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3626186.6666666665, ans=0.0 2023-11-26 23:41:57,364 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3626186.6666666665, ans=0.2 2023-11-26 23:42:12,195 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 543950 2023-11-26 23:42:17,504 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3626320.0, ans=0.125 2023-11-26 23:42:26,875 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.33 vs. limit=15.0 2023-11-26 23:42:30,052 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.95 vs. limit=15.0 2023-11-26 23:42:45,081 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 2900, loss[loss=0.07613, simple_loss=0.1047, pruned_loss=0.01479, audio_tagging_loss=0.008977, over 15614.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.08984, pruned_loss=0.0122, audio_tagging_loss=0.008437, over 3045222.20 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:42:47,747 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.048e+01 8.936e+01 9.597e+01 1.046e+02 1.381e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-26 23:42:52,939 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3626520.0, ans=0.125 2023-11-26 23:43:00,282 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3626586.6666666665, ans=0.0 2023-11-26 23:43:07,734 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 544000 2023-11-26 23:43:09,608 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-544000.pt 2023-11-26 23:43:20,556 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.87 vs. limit=6.0 2023-11-26 23:43:27,353 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3626720.0, ans=0.125 2023-11-26 23:43:42,234 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3626786.6666666665, ans=0.125 2023-11-26 23:43:44,239 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 2950, loss[loss=0.08503, simple_loss=0.1156, pruned_loss=0.01759, audio_tagging_loss=0.009627, over 15262.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.08941, pruned_loss=0.01218, audio_tagging_loss=0.008521, over 3050410.48 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:44:02,610 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3626920.0, ans=0.2 2023-11-26 23:44:06,774 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 544050 2023-11-26 23:44:29,297 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3627120.0, ans=0.0 2023-11-26 23:44:37,339 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3627120.0, ans=0.125 2023-11-26 23:44:38,383 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3627120.0, ans=0.0 2023-11-26 23:44:38,442 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3627120.0, ans=0.0 2023-11-26 23:44:40,275 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 3000, loss[loss=0.04811, simple_loss=0.05663, pruned_loss=0.008759, audio_tagging_loss=0.01103, over 14152.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.08981, pruned_loss=0.01232, audio_tagging_loss=0.008583, over 3044774.25 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:44:40,277 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-26 23:44:58,662 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([1.9817, 2.8867, 2.8654, 2.7253, 3.3474, 3.3622, 3.1214, 3.5814], device='cuda:0') 2023-11-26 23:45:03,307 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.4996, 3.4388, 3.7676, 3.7364], device='cuda:0') 2023-11-26 23:45:12,602 INFO [train_asr.py:1267] (0/4) Epoch 46, validation: loss=0.0572, simple_loss=0.05043, pruned_loss=0.00523, audio_tagging_loss=0.02676, over 4681554.00 frames. 2023-11-26 23:45:12,602 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-26 23:45:15,212 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.354e+01 9.002e+01 9.589e+01 1.016e+02 1.351e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-26 23:45:24,091 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3627253.3333333335, ans=0.125 2023-11-26 23:45:34,236 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.08 vs. limit=22.5 2023-11-26 23:45:35,154 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 544100 2023-11-26 23:45:46,987 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.35 vs. limit=15.0 2023-11-26 23:46:08,624 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 3050, loss[loss=0.09028, simple_loss=0.1302, pruned_loss=0.01626, audio_tagging_loss=0.00892, over 15772.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.08994, pruned_loss=0.0122, audio_tagging_loss=0.008652, over 3044673.59 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:46:28,306 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.80 vs. limit=22.5 2023-11-26 23:46:30,849 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 544150 2023-11-26 23:46:39,906 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 23:47:04,317 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 3100, loss[loss=0.08225, simple_loss=0.1236, pruned_loss=0.01527, audio_tagging_loss=0.005176, over 16389.00 frames. ], tot_loss[loss=0.06687, simple_loss=0.09142, pruned_loss=0.01251, audio_tagging_loss=0.008652, over 3044987.04 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:47:08,071 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.471e+01 9.067e+01 9.651e+01 1.052e+02 1.316e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-26 23:47:21,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3627920.0, ans=0.125 2023-11-26 23:47:22,565 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3627920.0, ans=0.1 2023-11-26 23:47:27,278 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 544200 2023-11-26 23:47:41,740 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3628053.3333333335, ans=0.1 2023-11-26 23:47:44,326 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3628053.3333333335, ans=0.1 2023-11-26 23:47:50,679 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.79 vs. limit=15.0 2023-11-26 23:47:58,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=3628120.0, ans=0.05 2023-11-26 23:48:01,332 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 3150, loss[loss=0.0605, simple_loss=0.08131, pruned_loss=0.00857, audio_tagging_loss=0.01128, over 15399.00 frames. ], tot_loss[loss=0.06644, simple_loss=0.09075, pruned_loss=0.0122, audio_tagging_loss=0.008862, over 3037295.13 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:48:04,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3628186.6666666665, ans=0.125 2023-11-26 23:48:05,825 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3628186.6666666665, ans=0.125 2023-11-26 23:48:16,083 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3628253.3333333335, ans=0.125 2023-11-26 23:48:23,395 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 544250 2023-11-26 23:48:30,476 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 23:48:42,802 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3628386.6666666665, ans=0.125 2023-11-26 23:48:46,419 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3628453.3333333335, ans=0.125 2023-11-26 23:48:50,816 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3628453.3333333335, ans=0.0 2023-11-26 23:48:57,457 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 3200, loss[loss=0.07268, simple_loss=0.09931, pruned_loss=0.0145, audio_tagging_loss=0.008528, over 15540.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.09092, pruned_loss=0.01223, audio_tagging_loss=0.008875, over 3036263.11 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:48:58,817 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3628520.0, ans=0.2 2023-11-26 23:49:00,630 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.270e+01 8.824e+01 9.434e+01 1.022e+02 1.249e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-26 23:49:17,902 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3628586.6666666665, ans=0.125 2023-11-26 23:49:19,855 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 544300 2023-11-26 23:49:37,908 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.23 vs. limit=15.0 2023-11-26 23:49:49,927 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 23:49:53,381 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 3250, loss[loss=0.04862, simple_loss=0.06049, pruned_loss=0.01006, audio_tagging_loss=0.008315, over 14453.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.09, pruned_loss=0.01223, audio_tagging_loss=0.008923, over 3036500.44 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:49:54,596 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3628853.3333333335, ans=0.0 2023-11-26 23:49:55,732 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3628853.3333333335, ans=0.0 2023-11-26 23:49:58,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3628853.3333333335, ans=0.0 2023-11-26 23:49:59,423 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff3.min_abs, batch_count=3628853.3333333335, ans=0.2 2023-11-26 23:50:07,836 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3628920.0, ans=0.0 2023-11-26 23:50:14,700 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3628986.6666666665, ans=0.125 2023-11-26 23:50:15,637 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 544350 2023-11-26 23:50:16,947 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3628986.6666666665, ans=0.1 2023-11-26 23:50:48,932 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 3300, loss[loss=0.05645, simple_loss=0.06622, pruned_loss=0.01087, audio_tagging_loss=0.01247, over 14837.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.08926, pruned_loss=0.01228, audio_tagging_loss=0.009064, over 3036637.53 frames. ], batch size: 62, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:50:52,766 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.794e+01 9.136e+01 9.828e+01 1.104e+02 1.362e+02, threshold=1.966e+02, percent-clipped=0.0 2023-11-26 23:51:11,466 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 544400 2023-11-26 23:51:22,576 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3629386.6666666665, ans=0.0 2023-11-26 23:51:28,239 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.20 vs. limit=15.0 2023-11-26 23:51:28,920 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3629386.6666666665, ans=0.125 2023-11-26 23:51:29,065 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3629386.6666666665, ans=0.0 2023-11-26 23:51:32,105 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3629386.6666666665, ans=0.0 2023-11-26 23:51:45,137 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 3350, loss[loss=0.08452, simple_loss=0.1268, pruned_loss=0.01463, audio_tagging_loss=0.006491, over 15269.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.08894, pruned_loss=0.01222, audio_tagging_loss=0.009009, over 3036952.53 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:52:07,888 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 544450 2023-11-26 23:52:21,291 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3629720.0, ans=0.1 2023-11-26 23:52:37,313 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.80 vs. limit=10.0 2023-11-26 23:52:39,256 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.03 vs. limit=15.0 2023-11-26 23:52:40,088 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3629853.3333333335, ans=0.0 2023-11-26 23:52:40,872 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 3400, loss[loss=0.06885, simple_loss=0.1023, pruned_loss=0.01128, audio_tagging_loss=0.006409, over 16318.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08868, pruned_loss=0.01213, audio_tagging_loss=0.008881, over 3037351.25 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:52:45,598 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.259e+01 8.870e+01 9.488e+01 1.024e+02 1.498e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-26 23:52:48,521 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3629853.3333333335, ans=0.125 2023-11-26 23:52:53,706 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3629920.0, ans=0.1 2023-11-26 23:52:53,806 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3629920.0, ans=0.0 2023-11-26 23:53:03,586 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 544500 2023-11-26 23:53:06,796 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3629986.6666666665, ans=0.125 2023-11-26 23:53:10,006 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3629986.6666666665, ans=0.07 2023-11-26 23:53:21,756 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.43 vs. limit=10.0 2023-11-26 23:53:33,196 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3630120.0, ans=0.125 2023-11-26 23:53:37,194 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 3450, loss[loss=0.063, simple_loss=0.09385, pruned_loss=0.008655, audio_tagging_loss=0.007417, over 17038.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08897, pruned_loss=0.01206, audio_tagging_loss=0.008707, over 3035399.76 frames. ], batch size: 65, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:53:46,655 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=3630186.6666666665, ans=10.0 2023-11-26 23:53:58,890 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 544550 2023-11-26 23:54:01,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3630320.0, ans=0.2 2023-11-26 23:54:21,615 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3630453.3333333335, ans=0.09899494936611666 2023-11-26 23:54:32,560 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 3500, loss[loss=0.06179, simple_loss=0.08984, pruned_loss=0.009444, audio_tagging_loss=0.007424, over 15862.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08896, pruned_loss=0.01215, audio_tagging_loss=0.008607, over 3041328.86 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:54:35,981 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3630520.0, ans=0.0 2023-11-26 23:54:36,794 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.154e+01 9.117e+01 9.795e+01 1.053e+02 1.409e+02, threshold=1.959e+02, percent-clipped=0.0 2023-11-26 23:54:40,213 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3630520.0, ans=0.0 2023-11-26 23:54:46,068 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3630586.6666666665, ans=0.1 2023-11-26 23:54:46,080 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3630586.6666666665, ans=0.0 2023-11-26 23:54:54,587 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3630653.3333333335, ans=0.1 2023-11-26 23:54:55,517 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 544600 2023-11-26 23:55:00,184 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3630653.3333333335, ans=0.1 2023-11-26 23:55:01,003 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 23:55:01,515 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.96 vs. limit=15.0 2023-11-26 23:55:22,110 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3630786.6666666665, ans=0.07 2023-11-26 23:55:28,211 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 3550, loss[loss=0.06266, simple_loss=0.08934, pruned_loss=0.009881, audio_tagging_loss=0.008109, over 15248.00 frames. ], tot_loss[loss=0.06447, simple_loss=0.08799, pruned_loss=0.01193, audio_tagging_loss=0.008547, over 3039677.51 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:55:50,656 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3630986.6666666665, ans=0.0 2023-11-26 23:55:51,546 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 544650 2023-11-26 23:56:02,531 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3631053.3333333335, ans=0.125 2023-11-26 23:56:06,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3631053.3333333335, ans=0.125 2023-11-26 23:56:07,846 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3631053.3333333335, ans=0.0 2023-11-26 23:56:18,608 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3631120.0, ans=0.0 2023-11-26 23:56:25,421 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 3600, loss[loss=0.0721, simple_loss=0.1027, pruned_loss=0.01447, audio_tagging_loss=0.006292, over 15956.00 frames. ], tot_loss[loss=0.06459, simple_loss=0.08822, pruned_loss=0.01193, audio_tagging_loss=0.008555, over 3039258.02 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:56:27,798 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3631186.6666666665, ans=0.0 2023-11-26 23:56:29,612 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.853e+01 8.770e+01 9.299e+01 1.012e+02 1.507e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-26 23:56:29,879 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3631186.6666666665, ans=0.0 2023-11-26 23:56:34,634 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3631186.6666666665, ans=0.125 2023-11-26 23:56:36,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3631253.3333333335, ans=0.1 2023-11-26 23:56:36,814 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3631253.3333333335, ans=0.07 2023-11-26 23:56:41,487 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.30 vs. limit=10.0 2023-11-26 23:56:44,099 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3631253.3333333335, ans=0.125 2023-11-26 23:56:47,213 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 544700 2023-11-26 23:56:50,816 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.54 vs. limit=15.0 2023-11-26 23:57:02,430 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3631386.6666666665, ans=0.2 2023-11-26 23:57:09,937 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3631453.3333333335, ans=0.07 2023-11-26 23:57:20,897 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 3650, loss[loss=0.03838, simple_loss=0.04483, pruned_loss=0.004466, audio_tagging_loss=0.0115, over 14783.00 frames. ], tot_loss[loss=0.06415, simple_loss=0.08752, pruned_loss=0.01186, audio_tagging_loss=0.008534, over 3036039.95 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:57:23,231 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3631520.0, ans=0.0 2023-11-26 23:57:38,067 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=3631586.6666666665, ans=10.0 2023-11-26 23:57:43,360 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 544750 2023-11-26 23:57:54,997 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.13 vs. limit=22.5 2023-11-26 23:57:58,600 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3631720.0, ans=0.125 2023-11-26 23:58:07,115 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3631786.6666666665, ans=0.125 2023-11-26 23:58:16,358 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 3700, loss[loss=0.07077, simple_loss=0.08888, pruned_loss=0.01373, audio_tagging_loss=0.0126, over 15485.00 frames. ], tot_loss[loss=0.06416, simple_loss=0.08722, pruned_loss=0.01181, audio_tagging_loss=0.008735, over 3041483.47 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:58:20,635 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.470e+01 8.914e+01 9.498e+01 1.020e+02 1.600e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-26 23:58:20,998 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3631853.3333333335, ans=10.0 2023-11-26 23:58:22,467 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3631853.3333333335, ans=0.5 2023-11-26 23:58:26,593 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.27 vs. limit=10.0 2023-11-26 23:58:36,412 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3631920.0, ans=0.125 2023-11-26 23:58:37,565 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3631920.0, ans=0.125 2023-11-26 23:58:40,016 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 544800 2023-11-26 23:58:56,481 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 23:59:13,825 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 3750, loss[loss=0.07524, simple_loss=0.1007, pruned_loss=0.01405, audio_tagging_loss=0.01083, over 15425.00 frames. ], tot_loss[loss=0.06492, simple_loss=0.08793, pruned_loss=0.0121, audio_tagging_loss=0.008854, over 3043590.62 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:59:23,130 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3632186.6666666665, ans=0.125 2023-11-26 23:59:25,424 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.61 vs. limit=22.5 2023-11-26 23:59:33,755 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3632253.3333333335, ans=0.1 2023-11-26 23:59:35,697 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 544850 2023-11-26 23:59:40,146 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 23:59:42,083 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3632320.0, ans=0.125 2023-11-26 23:59:51,121 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 00:00:09,627 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 3800, loss[loss=0.05841, simple_loss=0.08032, pruned_loss=0.01089, audio_tagging_loss=0.007354, over 15488.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08877, pruned_loss=0.01216, audio_tagging_loss=0.008709, over 3048920.45 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:00:14,900 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.742e+01 9.124e+01 9.737e+01 1.067e+02 1.479e+02, threshold=1.947e+02, percent-clipped=0.0 2023-11-27 00:00:16,162 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3632520.0, ans=0.0 2023-11-27 00:00:17,771 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.63 vs. limit=15.0 2023-11-27 00:00:24,755 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3632586.6666666665, ans=0.125 2023-11-27 00:00:31,627 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 544900 2023-11-27 00:00:33,210 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.66 vs. limit=15.0 2023-11-27 00:00:43,558 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3632720.0, ans=0.125 2023-11-27 00:00:48,316 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3632720.0, ans=0.125 2023-11-27 00:00:57,612 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3632786.6666666665, ans=0.035 2023-11-27 00:01:01,942 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3632786.6666666665, ans=0.125 2023-11-27 00:01:04,884 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 3850, loss[loss=0.07343, simple_loss=0.09723, pruned_loss=0.01742, audio_tagging_loss=0.007398, over 14729.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08879, pruned_loss=0.0122, audio_tagging_loss=0.008703, over 3051607.14 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:01:14,814 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.86 vs. limit=22.5 2023-11-27 00:01:21,835 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3632920.0, ans=0.0 2023-11-27 00:01:26,148 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3632920.0, ans=0.025 2023-11-27 00:01:28,067 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 544950 2023-11-27 00:01:44,667 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3633053.3333333335, ans=0.125 2023-11-27 00:01:45,641 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3633053.3333333335, ans=10.0 2023-11-27 00:01:45,707 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3633053.3333333335, ans=0.125 2023-11-27 00:01:53,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3633120.0, ans=0.125 2023-11-27 00:01:54,153 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3633120.0, ans=0.125 2023-11-27 00:02:01,465 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 3900, loss[loss=0.0608, simple_loss=0.08187, pruned_loss=0.006815, audio_tagging_loss=0.01305, over 15093.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08891, pruned_loss=0.01218, audio_tagging_loss=0.008808, over 3046638.65 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:02:02,880 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3633186.6666666665, ans=0.125 2023-11-27 00:02:07,293 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.712e+01 8.766e+01 9.510e+01 1.042e+02 1.590e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-27 00:02:23,949 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 545000 2023-11-27 00:02:50,330 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 00:02:58,143 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 3950, loss[loss=0.0704, simple_loss=0.08258, pruned_loss=0.01695, audio_tagging_loss=0.01216, over 14902.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08909, pruned_loss=0.01219, audio_tagging_loss=0.008854, over 3046293.55 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:03:01,708 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 00:03:19,637 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 545050 2023-11-27 00:03:39,946 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3633720.0, ans=0.125 2023-11-27 00:03:41,102 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3633720.0, ans=0.125 2023-11-27 00:03:44,259 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3633786.6666666665, ans=0.125 2023-11-27 00:03:48,635 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3633786.6666666665, ans=0.125 2023-11-27 00:03:53,740 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 4000, loss[loss=0.06987, simple_loss=0.1026, pruned_loss=0.01129, audio_tagging_loss=0.007259, over 15269.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.08915, pruned_loss=0.01216, audio_tagging_loss=0.008974, over 3037754.05 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:03:59,120 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.786e+01 9.088e+01 9.544e+01 1.045e+02 1.311e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-27 00:04:00,409 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3633853.3333333335, ans=0.0 2023-11-27 00:04:04,012 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3633920.0, ans=0.1 2023-11-27 00:04:16,112 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 545100 2023-11-27 00:04:21,490 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3633986.6666666665, ans=0.5 2023-11-27 00:04:27,966 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3634053.3333333335, ans=0.125 2023-11-27 00:04:39,670 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3634120.0, ans=0.125 2023-11-27 00:04:49,504 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 4050, loss[loss=0.08543, simple_loss=0.126, pruned_loss=0.0172, audio_tagging_loss=0.005238, over 14618.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08889, pruned_loss=0.01216, audio_tagging_loss=0.009014, over 3034138.27 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:04:52,310 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 00:04:56,659 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3634186.6666666665, ans=0.125 2023-11-27 00:05:12,247 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 545150 2023-11-27 00:05:35,645 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3634453.3333333335, ans=0.1 2023-11-27 00:05:38,866 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3634453.3333333335, ans=0.025 2023-11-27 00:05:46,168 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 4100, loss[loss=0.06574, simple_loss=0.08073, pruned_loss=0.01527, audio_tagging_loss=0.0101, over 15586.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.08941, pruned_loss=0.01227, audio_tagging_loss=0.009181, over 3034909.79 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:05:52,466 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.629e+01 8.888e+01 9.665e+01 1.037e+02 1.522e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-27 00:06:06,563 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3634653.3333333335, ans=0.125 2023-11-27 00:06:07,385 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 545200 2023-11-27 00:06:07,491 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3634653.3333333335, ans=0.0 2023-11-27 00:06:22,814 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3634720.0, ans=0.2 2023-11-27 00:06:24,019 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 00:06:41,837 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 4150, loss[loss=0.07976, simple_loss=0.1092, pruned_loss=0.02023, audio_tagging_loss=0.004912, over 15619.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.08968, pruned_loss=0.01226, audio_tagging_loss=0.008969, over 3032671.55 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:06:51,672 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3634920.0, ans=0.1 2023-11-27 00:07:01,190 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3634920.0, ans=10.0 2023-11-27 00:07:04,248 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 545250 2023-11-27 00:07:22,302 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 00:07:25,777 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3635120.0, ans=0.125 2023-11-27 00:07:30,038 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3635120.0, ans=0.0 2023-11-27 00:07:37,621 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 4200, loss[loss=0.06599, simple_loss=0.09568, pruned_loss=0.01134, audio_tagging_loss=0.006817, over 15566.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.09083, pruned_loss=0.01241, audio_tagging_loss=0.008867, over 3037043.16 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:07:44,533 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.471e+01 9.031e+01 9.580e+01 1.007e+02 1.196e+02, threshold=1.916e+02, percent-clipped=0.0 2023-11-27 00:08:00,818 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 545300 2023-11-27 00:08:13,828 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3635386.6666666665, ans=0.125 2023-11-27 00:08:17,202 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.58 vs. limit=15.0 2023-11-27 00:08:18,128 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3635386.6666666665, ans=0.2 2023-11-27 00:08:19,636 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3635386.6666666665, ans=0.1 2023-11-27 00:08:25,647 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3635453.3333333335, ans=0.125 2023-11-27 00:08:33,938 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 4250, loss[loss=0.0466, simple_loss=0.06332, pruned_loss=0.006558, audio_tagging_loss=0.008384, over 15219.00 frames. ], tot_loss[loss=0.06705, simple_loss=0.09194, pruned_loss=0.0124, audio_tagging_loss=0.008677, over 3041245.90 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:08:56,292 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 545350 2023-11-27 00:09:11,286 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3635720.0, ans=0.0 2023-11-27 00:09:14,464 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3635720.0, ans=0.1 2023-11-27 00:09:29,380 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3635853.3333333335, ans=0.125 2023-11-27 00:09:30,137 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 4300, loss[loss=0.04621, simple_loss=0.06268, pruned_loss=0.007724, audio_tagging_loss=0.007144, over 15174.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.09118, pruned_loss=0.01224, audio_tagging_loss=0.008638, over 3049070.47 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:09:36,590 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.859e+01 9.001e+01 9.508e+01 1.030e+02 1.268e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-27 00:09:52,705 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 545400 2023-11-27 00:09:57,196 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3635986.6666666665, ans=0.1 2023-11-27 00:10:25,661 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 4350, loss[loss=0.0535, simple_loss=0.07753, pruned_loss=0.006206, audio_tagging_loss=0.008534, over 14682.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.09144, pruned_loss=0.01216, audio_tagging_loss=0.008513, over 3046069.45 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 8.0 2023-11-27 00:10:42,856 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.50 vs. limit=5.0 2023-11-27 00:10:47,045 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3636253.3333333335, ans=0.1 2023-11-27 00:10:49,116 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 545450 2023-11-27 00:10:54,705 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3636320.0, ans=0.125 2023-11-27 00:10:59,487 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3636386.6666666665, ans=0.1 2023-11-27 00:11:00,430 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3636386.6666666665, ans=0.125 2023-11-27 00:11:02,581 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3636386.6666666665, ans=0.2 2023-11-27 00:11:22,391 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 4400, loss[loss=0.07814, simple_loss=0.1106, pruned_loss=0.01557, audio_tagging_loss=0.007262, over 14976.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.09174, pruned_loss=0.01222, audio_tagging_loss=0.008478, over 3049260.18 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:11:25,808 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3636520.0, ans=0.125 2023-11-27 00:11:30,493 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.966e+01 9.047e+01 9.734e+01 1.041e+02 1.241e+02, threshold=1.947e+02, percent-clipped=0.0 2023-11-27 00:11:38,251 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3636586.6666666665, ans=0.125 2023-11-27 00:11:43,204 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.52 vs. limit=15.0 2023-11-27 00:11:45,021 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 545500 2023-11-27 00:11:47,371 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3636653.3333333335, ans=0.125 2023-11-27 00:11:59,221 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3636720.0, ans=0.125 2023-11-27 00:12:18,831 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 4450, loss[loss=0.07066, simple_loss=0.09866, pruned_loss=0.01166, audio_tagging_loss=0.009676, over 15103.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.09099, pruned_loss=0.0122, audio_tagging_loss=0.008492, over 3049467.22 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:12:32,351 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3636920.0, ans=0.125 2023-11-27 00:12:33,318 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3636920.0, ans=0.2 2023-11-27 00:12:41,881 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 545550 2023-11-27 00:12:48,254 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3636986.6666666665, ans=0.125 2023-11-27 00:12:56,288 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3637053.3333333335, ans=0.125 2023-11-27 00:13:10,789 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3637120.0, ans=0.0 2023-11-27 00:13:14,868 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 4500, loss[loss=0.0642, simple_loss=0.09176, pruned_loss=0.0131, audio_tagging_loss=0.005217, over 14515.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.0911, pruned_loss=0.01235, audio_tagging_loss=0.008485, over 3050241.31 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:13:23,384 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.527e+01 8.728e+01 9.573e+01 1.027e+02 1.215e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-27 00:13:37,792 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 545600 2023-11-27 00:13:50,947 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3637386.6666666665, ans=0.125 2023-11-27 00:14:03,695 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3637453.3333333335, ans=0.125 2023-11-27 00:14:11,576 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 4550, loss[loss=0.05537, simple_loss=0.07589, pruned_loss=0.007457, audio_tagging_loss=0.009968, over 15939.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.09047, pruned_loss=0.01223, audio_tagging_loss=0.008498, over 3045086.93 frames. ], batch size: 63, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:14:14,158 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.15 vs. limit=22.5 2023-11-27 00:14:17,419 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.54 vs. limit=22.5 2023-11-27 00:14:21,919 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3637586.6666666665, ans=0.125 2023-11-27 00:14:25,252 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3637586.6666666665, ans=0.2 2023-11-27 00:14:33,597 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 545650 2023-11-27 00:14:54,439 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 00:14:55,691 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3637786.6666666665, ans=0.0 2023-11-27 00:15:06,922 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3637853.3333333335, ans=0.07 2023-11-27 00:15:07,689 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 4600, loss[loss=0.06373, simple_loss=0.08644, pruned_loss=0.01023, audio_tagging_loss=0.01029, over 14533.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08975, pruned_loss=0.01212, audio_tagging_loss=0.008638, over 3046288.22 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:15:15,122 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.389e+01 8.975e+01 9.578e+01 1.039e+02 1.809e+02, threshold=1.916e+02, percent-clipped=0.0 2023-11-27 00:15:29,896 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 545700 2023-11-27 00:15:39,044 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3637986.6666666665, ans=0.05 2023-11-27 00:15:52,512 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3638120.0, ans=0.125 2023-11-27 00:16:02,964 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 4650, loss[loss=0.05319, simple_loss=0.07014, pruned_loss=0.009305, audio_tagging_loss=0.00881, over 14096.00 frames. ], tot_loss[loss=0.066, simple_loss=0.09008, pruned_loss=0.01229, audio_tagging_loss=0.008663, over 3045377.64 frames. ], batch size: 52, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:16:05,930 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3638186.6666666665, ans=0.125 2023-11-27 00:16:13,840 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3638253.3333333335, ans=0.0 2023-11-27 00:16:24,005 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3638253.3333333335, ans=0.07 2023-11-27 00:16:26,551 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 545750 2023-11-27 00:16:31,979 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3638320.0, ans=0.125 2023-11-27 00:16:35,511 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.25 vs. limit=15.0 2023-11-27 00:16:44,843 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3638386.6666666665, ans=0.1 2023-11-27 00:16:45,852 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3638386.6666666665, ans=0.0 2023-11-27 00:16:46,881 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3638453.3333333335, ans=0.125 2023-11-27 00:16:59,971 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 4700, loss[loss=0.06912, simple_loss=0.1012, pruned_loss=0.01114, audio_tagging_loss=0.00737, over 14091.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.08987, pruned_loss=0.01217, audio_tagging_loss=0.00876, over 3043874.25 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:17:07,412 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.556e+01 9.156e+01 9.734e+01 1.046e+02 1.264e+02, threshold=1.947e+02, percent-clipped=0.0 2023-11-27 00:17:13,144 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.94 vs. limit=15.0 2023-11-27 00:17:21,964 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 545800 2023-11-27 00:17:26,677 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3638653.3333333335, ans=0.125 2023-11-27 00:17:27,743 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3638653.3333333335, ans=0.125 2023-11-27 00:17:31,996 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3638720.0, ans=0.0 2023-11-27 00:17:33,139 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.60 vs. limit=15.0 2023-11-27 00:17:35,367 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3638720.0, ans=0.125 2023-11-27 00:17:44,902 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.18 vs. limit=15.0 2023-11-27 00:17:53,555 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3638786.6666666665, ans=0.125 2023-11-27 00:17:56,608 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 4750, loss[loss=0.07008, simple_loss=0.0925, pruned_loss=0.01289, audio_tagging_loss=0.01095, over 14362.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.08992, pruned_loss=0.01206, audio_tagging_loss=0.008823, over 3044979.61 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:18:09,727 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3638920.0, ans=0.2 2023-11-27 00:18:17,666 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3638986.6666666665, ans=0.07 2023-11-27 00:18:18,016 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.78 vs. limit=15.0 2023-11-27 00:18:18,635 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 545850 2023-11-27 00:18:39,943 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3639120.0, ans=0.125 2023-11-27 00:18:48,593 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3639120.0, ans=0.125 2023-11-27 00:18:51,441 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 4800, loss[loss=0.07459, simple_loss=0.1063, pruned_loss=0.01377, audio_tagging_loss=0.007666, over 15051.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.08944, pruned_loss=0.01204, audio_tagging_loss=0.008944, over 3051388.79 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:18:59,441 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.631e+01 8.803e+01 9.667e+01 1.040e+02 1.360e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-27 00:19:07,317 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3639253.3333333335, ans=0.09899494936611666 2023-11-27 00:19:14,545 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 545900 2023-11-27 00:19:28,371 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3639386.6666666665, ans=0.125 2023-11-27 00:19:31,882 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.80 vs. limit=15.0 2023-11-27 00:19:44,989 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3639453.3333333335, ans=0.0 2023-11-27 00:19:48,916 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 4850, loss[loss=0.06192, simple_loss=0.08494, pruned_loss=0.0118, audio_tagging_loss=0.007654, over 14966.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.08955, pruned_loss=0.01206, audio_tagging_loss=0.008958, over 3042328.98 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:19:51,315 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3639520.0, ans=0.0 2023-11-27 00:20:10,921 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 545950 2023-11-27 00:20:11,110 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3639653.3333333335, ans=0.0 2023-11-27 00:20:15,360 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3639653.3333333335, ans=0.1 2023-11-27 00:20:23,945 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3639720.0, ans=0.0 2023-11-27 00:20:26,838 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.00 vs. limit=12.0 2023-11-27 00:20:33,056 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3639786.6666666665, ans=0.2 2023-11-27 00:20:38,376 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3639786.6666666665, ans=0.125 2023-11-27 00:20:44,965 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 4900, loss[loss=0.08091, simple_loss=0.1077, pruned_loss=0.01898, audio_tagging_loss=0.008089, over 15664.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.08994, pruned_loss=0.01216, audio_tagging_loss=0.008937, over 3050356.82 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:20:50,434 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3639853.3333333335, ans=0.1 2023-11-27 00:20:52,387 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.349e+01 8.929e+01 9.407e+01 1.023e+02 1.723e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-27 00:21:02,737 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.72 vs. limit=22.5 2023-11-27 00:21:06,444 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 546000 2023-11-27 00:21:27,680 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3640053.3333333335, ans=0.125 2023-11-27 00:21:36,234 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3640120.0, ans=0.125 2023-11-27 00:21:40,256 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 4950, loss[loss=0.04944, simple_loss=0.06975, pruned_loss=0.007547, audio_tagging_loss=0.007024, over 15225.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.08977, pruned_loss=0.01215, audio_tagging_loss=0.008702, over 3049993.53 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:21:43,907 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.73 vs. limit=22.5 2023-11-27 00:21:50,394 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3640253.3333333335, ans=0.0 2023-11-27 00:21:52,614 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3640253.3333333335, ans=0.125 2023-11-27 00:21:52,867 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.whiten.whitening_limit, batch_count=3640253.3333333335, ans=12.0 2023-11-27 00:21:59,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3640253.3333333335, ans=0.0 2023-11-27 00:22:01,065 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3640253.3333333335, ans=0.125 2023-11-27 00:22:02,998 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 546050 2023-11-27 00:22:06,299 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3640320.0, ans=0.0 2023-11-27 00:22:14,380 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3640386.6666666665, ans=0.1 2023-11-27 00:22:26,510 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.69 vs. limit=15.0 2023-11-27 00:22:27,260 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 00:22:35,936 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 5000, loss[loss=0.06893, simple_loss=0.09222, pruned_loss=0.01069, audio_tagging_loss=0.01213, over 15107.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.09024, pruned_loss=0.01214, audio_tagging_loss=0.008556, over 3057993.36 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:22:44,481 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.730e+01 8.925e+01 9.606e+01 1.023e+02 1.240e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-27 00:22:59,165 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 546100 2023-11-27 00:23:04,556 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3640653.3333333335, ans=0.09899494936611666 2023-11-27 00:23:31,599 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3640853.3333333335, ans=0.0 2023-11-27 00:23:32,435 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 5050, loss[loss=0.06258, simple_loss=0.08413, pruned_loss=0.01197, audio_tagging_loss=0.008548, over 15208.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.09076, pruned_loss=0.01223, audio_tagging_loss=0.008513, over 3056437.29 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:23:54,307 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 546150 2023-11-27 00:24:10,091 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3641053.3333333335, ans=0.125 2023-11-27 00:24:13,778 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.56 vs. limit=10.0 2023-11-27 00:24:22,258 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3641120.0, ans=0.125 2023-11-27 00:24:28,503 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 5100, loss[loss=0.05472, simple_loss=0.07268, pruned_loss=0.009722, audio_tagging_loss=0.008657, over 13962.00 frames. ], tot_loss[loss=0.06498, simple_loss=0.0891, pruned_loss=0.01184, audio_tagging_loss=0.008592, over 3048297.54 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:24:35,999 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.517e+01 8.921e+01 9.596e+01 1.036e+02 1.225e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-27 00:24:51,077 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 546200 2023-11-27 00:24:56,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3641320.0, ans=0.125 2023-11-27 00:25:06,408 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3641386.6666666665, ans=0.125 2023-11-27 00:25:13,860 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3641453.3333333335, ans=0.0 2023-11-27 00:25:18,431 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.43 vs. limit=12.0 2023-11-27 00:25:21,322 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3641453.3333333335, ans=0.125 2023-11-27 00:25:24,922 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 5150, loss[loss=0.06856, simple_loss=0.09719, pruned_loss=0.01267, audio_tagging_loss=0.007292, over 15734.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08921, pruned_loss=0.01199, audio_tagging_loss=0.008542, over 3041960.89 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:25:34,155 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3641520.0, ans=0.125 2023-11-27 00:25:47,970 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 546250 2023-11-27 00:25:49,151 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3641653.3333333335, ans=0.1 2023-11-27 00:26:07,708 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3641720.0, ans=0.125 2023-11-27 00:26:20,888 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 5200, loss[loss=0.0336, simple_loss=0.04433, pruned_loss=0.002369, audio_tagging_loss=0.009068, over 16956.00 frames. ], tot_loss[loss=0.06473, simple_loss=0.08869, pruned_loss=0.01188, audio_tagging_loss=0.008508, over 3053390.41 frames. ], batch size: 66, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:26:24,317 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-27 00:26:29,252 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.652e+01 9.022e+01 9.726e+01 1.018e+02 1.270e+02, threshold=1.945e+02, percent-clipped=0.0 2023-11-27 00:26:43,156 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 546300 2023-11-27 00:26:51,100 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3641986.6666666665, ans=0.125 2023-11-27 00:27:15,784 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3642186.6666666665, ans=0.1 2023-11-27 00:27:16,515 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 5250, loss[loss=0.05535, simple_loss=0.06674, pruned_loss=0.01063, audio_tagging_loss=0.01135, over 14446.00 frames. ], tot_loss[loss=0.06511, simple_loss=0.08931, pruned_loss=0.01186, audio_tagging_loss=0.008594, over 3055424.14 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:27:22,213 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.08 vs. limit=22.5 2023-11-27 00:27:38,980 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 546350 2023-11-27 00:27:39,210 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3642320.0, ans=0.125 2023-11-27 00:27:44,454 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3642320.0, ans=0.125 2023-11-27 00:27:46,491 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3642320.0, ans=0.125 2023-11-27 00:27:52,919 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3642386.6666666665, ans=0.1 2023-11-27 00:27:52,968 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3642386.6666666665, ans=0.2 2023-11-27 00:27:54,990 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3642386.6666666665, ans=0.125 2023-11-27 00:28:03,083 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.31 vs. limit=12.0 2023-11-27 00:28:05,603 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=3642453.3333333335, ans=0.05 2023-11-27 00:28:11,790 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 5300, loss[loss=0.08682, simple_loss=0.1185, pruned_loss=0.02014, audio_tagging_loss=0.007413, over 14937.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.0897, pruned_loss=0.01202, audio_tagging_loss=0.008544, over 3053099.06 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 8.0 2023-11-27 00:28:22,507 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.462e+01 9.037e+01 9.686e+01 1.067e+02 1.240e+02, threshold=1.937e+02, percent-clipped=0.0 2023-11-27 00:28:35,409 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 546400 2023-11-27 00:28:35,411 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3642653.3333333335, ans=0.125 2023-11-27 00:28:46,404 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3642720.0, ans=0.125 2023-11-27 00:29:04,495 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3642786.6666666665, ans=0.1 2023-11-27 00:29:08,591 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 5350, loss[loss=0.0606, simple_loss=0.07658, pruned_loss=0.01199, audio_tagging_loss=0.01032, over 14123.00 frames. ], tot_loss[loss=0.06528, simple_loss=0.08938, pruned_loss=0.01197, audio_tagging_loss=0.008618, over 3043797.76 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 8.0 2023-11-27 00:29:12,994 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3642853.3333333335, ans=0.95 2023-11-27 00:29:31,164 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 546450 2023-11-27 00:29:31,336 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3642986.6666666665, ans=0.125 2023-11-27 00:29:45,092 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 00:29:48,399 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3643053.3333333335, ans=0.2 2023-11-27 00:29:53,666 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3643120.0, ans=0.0 2023-11-27 00:30:05,160 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 5400, loss[loss=0.06602, simple_loss=0.0921, pruned_loss=0.01428, audio_tagging_loss=0.005691, over 15451.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08971, pruned_loss=0.01205, audio_tagging_loss=0.008545, over 3043376.05 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 8.0 2023-11-27 00:30:10,669 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3643186.6666666665, ans=0.0 2023-11-27 00:30:14,662 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.823e+01 8.994e+01 9.613e+01 1.047e+02 1.327e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-27 00:30:27,029 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 546500 2023-11-27 00:30:35,165 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3643320.0, ans=0.125 2023-11-27 00:30:43,126 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3643386.6666666665, ans=0.0 2023-11-27 00:31:00,369 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 5450, loss[loss=0.07271, simple_loss=0.09583, pruned_loss=0.01471, audio_tagging_loss=0.01009, over 15423.00 frames. ], tot_loss[loss=0.06528, simple_loss=0.08937, pruned_loss=0.01199, audio_tagging_loss=0.008606, over 3043738.79 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 8.0 2023-11-27 00:31:23,090 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 546550 2023-11-27 00:31:42,868 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3643720.0, ans=0.2 2023-11-27 00:31:45,144 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3643786.6666666665, ans=0.125 2023-11-27 00:31:50,971 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3643786.6666666665, ans=0.2 2023-11-27 00:31:53,625 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3643786.6666666665, ans=0.125 2023-11-27 00:31:55,206 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.81 vs. limit=22.5 2023-11-27 00:31:55,848 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3643853.3333333335, ans=0.125 2023-11-27 00:31:56,315 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.27 vs. limit=15.0 2023-11-27 00:31:56,713 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 5500, loss[loss=0.07419, simple_loss=0.1055, pruned_loss=0.01304, audio_tagging_loss=0.008412, over 15604.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.0898, pruned_loss=0.01214, audio_tagging_loss=0.008617, over 3044543.91 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 8.0 2023-11-27 00:32:05,031 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3643853.3333333335, ans=0.125 2023-11-27 00:32:06,999 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.209e+01 8.879e+01 9.698e+01 1.044e+02 1.314e+02, threshold=1.940e+02, percent-clipped=0.0 2023-11-27 00:32:08,219 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3643920.0, ans=0.125 2023-11-27 00:32:11,449 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3643920.0, ans=0.125 2023-11-27 00:32:19,449 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 546600 2023-11-27 00:32:45,263 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3644120.0, ans=0.0 2023-11-27 00:32:53,006 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 5550, loss[loss=0.06593, simple_loss=0.0816, pruned_loss=0.01392, audio_tagging_loss=0.01121, over 15222.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08902, pruned_loss=0.01191, audio_tagging_loss=0.008768, over 3043726.58 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 8.0 2023-11-27 00:32:57,442 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3644186.6666666665, ans=0.125 2023-11-27 00:33:03,144 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3644253.3333333335, ans=0.125 2023-11-27 00:33:03,218 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3644253.3333333335, ans=0.0 2023-11-27 00:33:15,229 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 546650 2023-11-27 00:33:33,453 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3644386.6666666665, ans=0.2 2023-11-27 00:33:37,630 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3644453.3333333335, ans=0.0 2023-11-27 00:33:48,619 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 5600, loss[loss=0.06044, simple_loss=0.07733, pruned_loss=0.009617, audio_tagging_loss=0.01216, over 14876.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.08866, pruned_loss=0.01189, audio_tagging_loss=0.008911, over 3046020.15 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:33:58,689 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.232e+01 8.835e+01 9.433e+01 1.028e+02 1.297e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-27 00:33:59,246 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.78 vs. limit=22.5 2023-11-27 00:34:06,949 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3644586.6666666665, ans=0.0 2023-11-27 00:34:11,089 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 546700 2023-11-27 00:34:17,014 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3644653.3333333335, ans=0.125 2023-11-27 00:34:22,323 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3644720.0, ans=0.0 2023-11-27 00:34:28,955 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 00:34:30,181 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3644720.0, ans=0.05 2023-11-27 00:34:33,435 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3644786.6666666665, ans=0.125 2023-11-27 00:34:40,618 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.84 vs. limit=15.0 2023-11-27 00:34:44,733 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 5650, loss[loss=0.08786, simple_loss=0.1232, pruned_loss=0.01641, audio_tagging_loss=0.009862, over 15460.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08853, pruned_loss=0.01191, audio_tagging_loss=0.008937, over 3051596.36 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:34:52,415 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3644853.3333333335, ans=10.0 2023-11-27 00:35:04,636 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3644920.0, ans=0.0 2023-11-27 00:35:06,495 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 546750 2023-11-27 00:35:28,669 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3645120.0, ans=0.2 2023-11-27 00:35:31,250 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.84 vs. limit=22.5 2023-11-27 00:35:40,809 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 5700, loss[loss=0.04269, simple_loss=0.05529, pruned_loss=0.00615, audio_tagging_loss=0.008889, over 13597.00 frames. ], tot_loss[loss=0.06472, simple_loss=0.08811, pruned_loss=0.01175, audio_tagging_loss=0.008921, over 3047081.27 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:35:43,200 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=3645186.6666666665, ans=10.0 2023-11-27 00:35:50,416 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.868e+01 8.853e+01 9.368e+01 1.022e+02 1.504e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-27 00:35:55,770 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.33 vs. limit=6.0 2023-11-27 00:35:56,674 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.76 vs. limit=15.0 2023-11-27 00:35:59,781 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3645253.3333333335, ans=0.125 2023-11-27 00:36:03,315 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 546800 2023-11-27 00:36:12,976 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.51 vs. limit=22.5 2023-11-27 00:36:21,630 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3645386.6666666665, ans=0.125 2023-11-27 00:36:26,249 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.11 vs. limit=12.0 2023-11-27 00:36:36,198 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 5750, loss[loss=0.08234, simple_loss=0.1195, pruned_loss=0.01688, audio_tagging_loss=0.005731, over 15367.00 frames. ], tot_loss[loss=0.06502, simple_loss=0.08869, pruned_loss=0.01187, audio_tagging_loss=0.008814, over 3046645.12 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:36:36,403 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3645520.0, ans=0.0 2023-11-27 00:36:59,308 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 546850 2023-11-27 00:37:21,698 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3645786.6666666665, ans=0.125 2023-11-27 00:37:32,707 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 5800, loss[loss=0.06852, simple_loss=0.0974, pruned_loss=0.01392, audio_tagging_loss=0.0059, over 15179.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.0895, pruned_loss=0.01206, audio_tagging_loss=0.008669, over 3050455.53 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:37:32,925 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3645853.3333333335, ans=0.1 2023-11-27 00:37:42,694 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.557e+01 8.951e+01 9.661e+01 1.044e+02 1.253e+02, threshold=1.932e+02, percent-clipped=0.0 2023-11-27 00:37:48,816 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3645920.0, ans=0.0 2023-11-27 00:37:55,094 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 546900 2023-11-27 00:38:14,334 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3646053.3333333335, ans=0.125 2023-11-27 00:38:19,661 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.51 vs. limit=10.0 2023-11-27 00:38:20,111 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3646120.0, ans=0.0 2023-11-27 00:38:29,093 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 5850, loss[loss=0.08241, simple_loss=0.1138, pruned_loss=0.01899, audio_tagging_loss=0.006523, over 15506.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.08995, pruned_loss=0.01203, audio_tagging_loss=0.008588, over 3052827.04 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:38:50,961 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 546950 2023-11-27 00:38:55,824 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3646320.0, ans=0.1 2023-11-27 00:39:00,510 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3646320.0, ans=0.0 2023-11-27 00:39:13,011 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3646453.3333333335, ans=0.125 2023-11-27 00:39:24,503 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 5900, loss[loss=0.06622, simple_loss=0.08407, pruned_loss=0.01309, audio_tagging_loss=0.0111, over 14453.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08948, pruned_loss=0.01192, audio_tagging_loss=0.008609, over 3060196.29 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:39:34,483 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.679e+01 8.740e+01 9.357e+01 9.859e+01 1.378e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-27 00:39:47,249 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 547000 2023-11-27 00:39:57,217 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.77 vs. limit=22.5 2023-11-27 00:40:01,005 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3646720.0, ans=0.0 2023-11-27 00:40:08,448 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3646786.6666666665, ans=0.2 2023-11-27 00:40:13,999 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.58 vs. limit=15.0 2023-11-27 00:40:16,513 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.95 vs. limit=15.0 2023-11-27 00:40:20,824 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 5950, loss[loss=0.1046, simple_loss=0.1396, pruned_loss=0.02711, audio_tagging_loss=0.007732, over 15333.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08962, pruned_loss=0.01207, audio_tagging_loss=0.008705, over 3055247.61 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:40:24,782 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3646853.3333333335, ans=0.1 2023-11-27 00:40:38,097 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3646920.0, ans=0.125 2023-11-27 00:40:43,311 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 547050 2023-11-27 00:40:45,554 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3646986.6666666665, ans=0.07 2023-11-27 00:40:50,288 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.81 vs. limit=15.0 2023-11-27 00:41:16,162 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 6000, loss[loss=0.04782, simple_loss=0.06465, pruned_loss=0.007544, audio_tagging_loss=0.007946, over 15635.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08988, pruned_loss=0.01202, audio_tagging_loss=0.008523, over 3051098.64 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:41:16,164 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-27 00:41:48,449 INFO [train_asr.py:1267] (0/4) Epoch 46, validation: loss=0.05759, simple_loss=0.05057, pruned_loss=0.005367, audio_tagging_loss=0.02694, over 4681554.00 frames. 2023-11-27 00:41:48,449 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-27 00:41:53,896 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3647186.6666666665, ans=0.0 2023-11-27 00:41:54,248 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.03 vs. limit=22.5 2023-11-27 00:41:57,632 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3647186.6666666665, ans=0.125 2023-11-27 00:41:58,336 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.728e+01 8.712e+01 9.506e+01 1.018e+02 1.169e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-27 00:42:04,371 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3647253.3333333335, ans=0.125 2023-11-27 00:42:09,746 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3647320.0, ans=0.125 2023-11-27 00:42:10,651 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 547100 2023-11-27 00:42:27,945 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 00:42:34,515 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3647453.3333333335, ans=0.125 2023-11-27 00:42:44,301 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 6050, loss[loss=0.07507, simple_loss=0.1034, pruned_loss=0.01726, audio_tagging_loss=0.006097, over 14996.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.0899, pruned_loss=0.01206, audio_tagging_loss=0.008473, over 3046246.85 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:43:05,759 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.37 vs. limit=12.0 2023-11-27 00:43:06,126 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 547150 2023-11-27 00:43:11,695 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3647653.3333333335, ans=0.125 2023-11-27 00:43:25,357 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3647720.0, ans=0.0 2023-11-27 00:43:25,393 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3647720.0, ans=0.2 2023-11-27 00:43:31,350 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3647786.6666666665, ans=0.125 2023-11-27 00:43:35,657 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3647786.6666666665, ans=0.125 2023-11-27 00:43:35,668 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3647786.6666666665, ans=0.125 2023-11-27 00:43:39,418 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3647853.3333333335, ans=0.125 2023-11-27 00:43:40,368 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 6100, loss[loss=0.06023, simple_loss=0.07994, pruned_loss=0.01289, audio_tagging_loss=0.007369, over 16449.00 frames. ], tot_loss[loss=0.06476, simple_loss=0.08879, pruned_loss=0.0119, audio_tagging_loss=0.008457, over 3041001.75 frames. ], batch size: 62, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:43:44,023 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.28 vs. limit=22.5 2023-11-27 00:43:49,748 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.697e+01 8.942e+01 9.763e+01 1.039e+02 1.274e+02, threshold=1.953e+02, percent-clipped=0.0 2023-11-27 00:43:50,940 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3647920.0, ans=0.2 2023-11-27 00:43:51,086 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3647920.0, ans=0.2 2023-11-27 00:44:01,498 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 547200 2023-11-27 00:44:01,707 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3647986.6666666665, ans=0.1 2023-11-27 00:44:20,328 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.96 vs. limit=22.5 2023-11-27 00:44:22,415 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.23 vs. limit=10.0 2023-11-27 00:44:26,408 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3648120.0, ans=0.0 2023-11-27 00:44:27,378 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3648120.0, ans=0.0 2023-11-27 00:44:30,717 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 00:44:35,848 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 6150, loss[loss=0.06187, simple_loss=0.08623, pruned_loss=0.01097, audio_tagging_loss=0.007777, over 15372.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.0889, pruned_loss=0.01198, audio_tagging_loss=0.008534, over 3036487.89 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:44:49,446 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.03 vs. limit=12.0 2023-11-27 00:44:53,892 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3648253.3333333335, ans=0.125 2023-11-27 00:44:58,704 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 547250 2023-11-27 00:45:10,040 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3648386.6666666665, ans=0.0 2023-11-27 00:45:12,438 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.16 vs. limit=22.5 2023-11-27 00:45:21,654 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3648453.3333333335, ans=0.125 2023-11-27 00:45:23,871 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.62 vs. limit=15.0 2023-11-27 00:45:31,495 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 6200, loss[loss=0.06253, simple_loss=0.07907, pruned_loss=0.01265, audio_tagging_loss=0.01034, over 15036.00 frames. ], tot_loss[loss=0.065, simple_loss=0.0888, pruned_loss=0.01205, audio_tagging_loss=0.00855, over 3032406.61 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:45:43,670 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.816e+01 8.925e+01 9.447e+01 1.055e+02 1.440e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-27 00:45:54,298 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 547300 2023-11-27 00:46:16,042 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.58 vs. limit=10.0 2023-11-27 00:46:19,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3648786.6666666665, ans=0.5 2023-11-27 00:46:28,167 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 6250, loss[loss=0.05676, simple_loss=0.08084, pruned_loss=0.00828, audio_tagging_loss=0.008067, over 15334.00 frames. ], tot_loss[loss=0.06471, simple_loss=0.0883, pruned_loss=0.01191, audio_tagging_loss=0.008649, over 3033972.99 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:46:38,002 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3648920.0, ans=0.125 2023-11-27 00:46:42,195 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3648920.0, ans=0.0 2023-11-27 00:46:49,453 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 547350 2023-11-27 00:47:04,402 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3649053.3333333335, ans=0.125 2023-11-27 00:47:13,669 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.80 vs. limit=15.0 2023-11-27 00:47:15,540 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3649120.0, ans=0.1 2023-11-27 00:47:22,753 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 6300, loss[loss=0.06195, simple_loss=0.08725, pruned_loss=0.00854, audio_tagging_loss=0.009788, over 16246.00 frames. ], tot_loss[loss=0.06421, simple_loss=0.08727, pruned_loss=0.01182, audio_tagging_loss=0.008751, over 3043019.86 frames. ], batch size: 62, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:47:30,319 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3649186.6666666665, ans=0.1 2023-11-27 00:47:33,386 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.486e+01 8.827e+01 9.482e+01 1.035e+02 1.564e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-27 00:47:45,069 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 547400 2023-11-27 00:47:55,097 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3649320.0, ans=0.125 2023-11-27 00:48:09,924 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3649453.3333333335, ans=0.125 2023-11-27 00:48:15,406 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.48 vs. limit=12.0 2023-11-27 00:48:18,650 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 6350, loss[loss=0.07136, simple_loss=0.09538, pruned_loss=0.01301, audio_tagging_loss=0.01066, over 15683.00 frames. ], tot_loss[loss=0.0648, simple_loss=0.08837, pruned_loss=0.01193, audio_tagging_loss=0.008692, over 3039962.70 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:48:41,690 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 547450 2023-11-27 00:48:41,762 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3649653.3333333335, ans=0.0 2023-11-27 00:48:50,923 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.19 vs. limit=15.0 2023-11-27 00:48:53,630 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3649720.0, ans=0.0 2023-11-27 00:48:54,642 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3649720.0, ans=0.1 2023-11-27 00:48:55,743 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3649720.0, ans=0.1 2023-11-27 00:49:15,282 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 6400, loss[loss=0.07893, simple_loss=0.1038, pruned_loss=0.01829, audio_tagging_loss=0.008756, over 14171.00 frames. ], tot_loss[loss=0.0649, simple_loss=0.08842, pruned_loss=0.01194, audio_tagging_loss=0.008749, over 3041295.89 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:49:26,398 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.405e+01 8.880e+01 9.472e+01 1.045e+02 1.391e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-27 00:49:29,785 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3649920.0, ans=0.0 2023-11-27 00:49:37,183 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 547500 2023-11-27 00:49:39,437 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3649986.6666666665, ans=0.1 2023-11-27 00:49:40,508 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3649986.6666666665, ans=0.07 2023-11-27 00:49:45,289 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3649986.6666666665, ans=0.0 2023-11-27 00:49:57,496 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3650053.3333333335, ans=0.125 2023-11-27 00:50:05,449 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.53 vs. limit=22.5 2023-11-27 00:50:09,419 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.21 vs. limit=10.0 2023-11-27 00:50:11,017 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 6450, loss[loss=0.05956, simple_loss=0.07724, pruned_loss=0.01113, audio_tagging_loss=0.009799, over 15244.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.08829, pruned_loss=0.01193, audio_tagging_loss=0.008832, over 3039499.33 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:50:22,934 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3650253.3333333335, ans=0.0 2023-11-27 00:50:30,262 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3650253.3333333335, ans=0.125 2023-11-27 00:50:33,257 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 547550 2023-11-27 00:51:05,929 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 6500, loss[loss=0.06972, simple_loss=0.09138, pruned_loss=0.01528, audio_tagging_loss=0.008755, over 14037.00 frames. ], tot_loss[loss=0.0651, simple_loss=0.08844, pruned_loss=0.01206, audio_tagging_loss=0.008821, over 3035642.91 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:51:07,194 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 00:51:17,708 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.492e+01 8.951e+01 9.386e+01 1.000e+02 1.193e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-27 00:51:20,047 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3650586.6666666665, ans=0.0 2023-11-27 00:51:29,076 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 547600 2023-11-27 00:52:02,854 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 6550, loss[loss=0.05847, simple_loss=0.08214, pruned_loss=0.009629, audio_tagging_loss=0.007774, over 15150.00 frames. ], tot_loss[loss=0.06439, simple_loss=0.0876, pruned_loss=0.01183, audio_tagging_loss=0.008761, over 3039546.59 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:52:25,146 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 547650 2023-11-27 00:52:48,269 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.32 vs. limit=15.0 2023-11-27 00:52:53,322 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3651120.0, ans=0.125 2023-11-27 00:52:54,302 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3651120.0, ans=0.07 2023-11-27 00:52:58,297 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 6600, loss[loss=0.06146, simple_loss=0.0853, pruned_loss=0.01044, audio_tagging_loss=0.008369, over 15316.00 frames. ], tot_loss[loss=0.06459, simple_loss=0.08817, pruned_loss=0.01185, audio_tagging_loss=0.008659, over 3044427.75 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:52:59,020 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3651186.6666666665, ans=0.125 2023-11-27 00:53:03,285 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3651186.6666666665, ans=0.125 2023-11-27 00:53:09,343 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.438e+01 8.821e+01 9.435e+01 1.031e+02 1.384e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-27 00:53:21,057 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 547700 2023-11-27 00:53:27,668 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 00:53:54,143 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 6650, loss[loss=0.04344, simple_loss=0.05881, pruned_loss=0.005798, audio_tagging_loss=0.00824, over 14796.00 frames. ], tot_loss[loss=0.06505, simple_loss=0.08886, pruned_loss=0.01202, audio_tagging_loss=0.008598, over 3041411.35 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:53:55,462 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3651520.0, ans=0.0 2023-11-27 00:54:09,513 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.whiten.whitening_limit, batch_count=3651586.6666666665, ans=12.0 2023-11-27 00:54:11,442 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3651586.6666666665, ans=0.125 2023-11-27 00:54:17,095 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 547750 2023-11-27 00:54:23,479 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3651653.3333333335, ans=0.0 2023-11-27 00:54:45,251 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3651786.6666666665, ans=0.125 2023-11-27 00:54:46,819 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.72 vs. limit=15.0 2023-11-27 00:54:50,318 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 6700, loss[loss=0.06404, simple_loss=0.08664, pruned_loss=0.01168, audio_tagging_loss=0.009051, over 15224.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.08939, pruned_loss=0.01196, audio_tagging_loss=0.008507, over 3046466.03 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:54:58,583 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3651853.3333333335, ans=0.125 2023-11-27 00:54:59,598 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3651853.3333333335, ans=0.125 2023-11-27 00:55:01,534 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.189e+01 8.865e+01 9.450e+01 1.017e+02 1.235e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-27 00:55:12,872 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 547800 2023-11-27 00:55:21,686 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3651986.6666666665, ans=0.125 2023-11-27 00:55:30,727 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.63 vs. limit=15.0 2023-11-27 00:55:31,446 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3652053.3333333335, ans=0.1 2023-11-27 00:55:41,892 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.12 vs. limit=15.0 2023-11-27 00:55:46,485 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 6750, loss[loss=0.05788, simple_loss=0.08657, pruned_loss=0.007904, audio_tagging_loss=0.006695, over 16546.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.08987, pruned_loss=0.01207, audio_tagging_loss=0.008498, over 3041704.39 frames. ], batch size: 61, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:55:49,848 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3652186.6666666665, ans=0.0 2023-11-27 00:55:52,174 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.83 vs. limit=10.0 2023-11-27 00:55:53,120 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3652186.6666666665, ans=0.125 2023-11-27 00:55:56,913 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3652253.3333333335, ans=0.125 2023-11-27 00:56:02,092 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3652253.3333333335, ans=0.125 2023-11-27 00:56:04,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3652253.3333333335, ans=0.0 2023-11-27 00:56:09,334 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 547850 2023-11-27 00:56:17,318 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.32 vs. limit=22.5 2023-11-27 00:56:20,371 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3652386.6666666665, ans=0.125 2023-11-27 00:56:32,564 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.11 vs. limit=15.0 2023-11-27 00:56:42,052 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 6800, loss[loss=0.08605, simple_loss=0.1229, pruned_loss=0.01634, audio_tagging_loss=0.008264, over 15707.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.09008, pruned_loss=0.01215, audio_tagging_loss=0.008369, over 3044418.48 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:56:53,721 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.866e+01 8.978e+01 9.815e+01 1.051e+02 1.384e+02, threshold=1.963e+02, percent-clipped=0.0 2023-11-27 00:56:55,094 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3652586.6666666665, ans=0.125 2023-11-27 00:57:01,893 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.98 vs. limit=15.0 2023-11-27 00:57:04,888 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 547900 2023-11-27 00:57:06,080 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3652653.3333333335, ans=0.125 2023-11-27 00:57:38,343 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 6850, loss[loss=0.06354, simple_loss=0.08, pruned_loss=0.01274, audio_tagging_loss=0.0108, over 15266.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08954, pruned_loss=0.01212, audio_tagging_loss=0.008501, over 3043490.68 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:57:40,504 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3652853.3333333335, ans=0.125 2023-11-27 00:58:00,220 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 547950 2023-11-27 00:58:09,307 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3652986.6666666665, ans=0.2 2023-11-27 00:58:12,600 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3653053.3333333335, ans=0.2 2023-11-27 00:58:26,948 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.56 vs. limit=10.0 2023-11-27 00:58:30,217 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3653120.0, ans=0.125 2023-11-27 00:58:34,327 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 6900, loss[loss=0.0516, simple_loss=0.06925, pruned_loss=0.007072, audio_tagging_loss=0.009902, over 15009.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.08934, pruned_loss=0.01195, audio_tagging_loss=0.008551, over 3041402.88 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:58:36,704 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3653186.6666666665, ans=0.0 2023-11-27 00:58:39,958 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3653186.6666666665, ans=0.0 2023-11-27 00:58:41,027 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3653186.6666666665, ans=0.125 2023-11-27 00:58:45,073 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.731e+01 8.904e+01 9.598e+01 1.032e+02 1.208e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-27 00:58:56,296 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 548000 2023-11-27 00:58:58,164 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-548000.pt 2023-11-27 00:59:09,145 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.90 vs. limit=15.0 2023-11-27 00:59:12,501 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3653386.6666666665, ans=0.125 2023-11-27 00:59:16,635 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3653386.6666666665, ans=0.07 2023-11-27 00:59:19,660 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 00:59:21,998 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3653453.3333333335, ans=0.0 2023-11-27 00:59:26,153 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3653453.3333333335, ans=0.125 2023-11-27 00:59:31,360 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 6950, loss[loss=0.05147, simple_loss=0.0701, pruned_loss=0.006424, audio_tagging_loss=0.009996, over 15886.00 frames. ], tot_loss[loss=0.06469, simple_loss=0.0885, pruned_loss=0.01183, audio_tagging_loss=0.008612, over 3041380.86 frames. ], batch size: 61, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:59:42,222 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3653586.6666666665, ans=0.09899494936611666 2023-11-27 00:59:43,785 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3653586.6666666665, ans=0.125 2023-11-27 00:59:46,405 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.52 vs. limit=15.0 2023-11-27 00:59:51,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3653586.6666666665, ans=0.0 2023-11-27 00:59:54,813 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 548050 2023-11-27 00:59:55,415 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.56 vs. limit=22.5 2023-11-27 00:59:57,115 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3653653.3333333335, ans=0.07 2023-11-27 01:00:22,163 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.59 vs. limit=6.0 2023-11-27 01:00:27,984 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 7000, loss[loss=0.05728, simple_loss=0.07384, pruned_loss=0.01109, audio_tagging_loss=0.009266, over 15413.00 frames. ], tot_loss[loss=0.06472, simple_loss=0.08852, pruned_loss=0.01178, audio_tagging_loss=0.008686, over 3041119.30 frames. ], batch size: 61, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 01:00:28,253 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3653853.3333333335, ans=0.125 2023-11-27 01:00:39,178 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.987e+01 8.912e+01 9.354e+01 1.017e+02 1.441e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-27 01:00:49,659 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 548100 2023-11-27 01:01:23,293 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 7050, loss[loss=0.05945, simple_loss=0.07477, pruned_loss=0.01253, audio_tagging_loss=0.009539, over 14239.00 frames. ], tot_loss[loss=0.06426, simple_loss=0.08784, pruned_loss=0.01162, audio_tagging_loss=0.00872, over 3040033.51 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 01:01:34,149 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 01:01:35,577 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.24 vs. limit=15.0 2023-11-27 01:01:44,580 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 548150 2023-11-27 01:01:57,482 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3654386.6666666665, ans=0.125 2023-11-27 01:02:02,247 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3654386.6666666665, ans=0.025 2023-11-27 01:02:11,841 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3654453.3333333335, ans=0.125 2023-11-27 01:02:18,148 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 7100, loss[loss=0.06515, simple_loss=0.08991, pruned_loss=0.01364, audio_tagging_loss=0.006559, over 15743.00 frames. ], tot_loss[loss=0.06486, simple_loss=0.08854, pruned_loss=0.01185, audio_tagging_loss=0.008741, over 3037252.94 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 01:02:23,575 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3654520.0, ans=0.125 2023-11-27 01:02:29,593 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3654586.6666666665, ans=0.2 2023-11-27 01:02:30,244 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.735e+01 8.909e+01 9.590e+01 1.018e+02 1.394e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-27 01:02:34,122 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3654586.6666666665, ans=0.125 2023-11-27 01:02:39,460 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3654653.3333333335, ans=0.2 2023-11-27 01:02:40,395 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 548200 2023-11-27 01:02:56,130 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3654720.0, ans=0.025 2023-11-27 01:02:57,170 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3654720.0, ans=0.2 2023-11-27 01:03:02,901 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.64 vs. limit=10.0 2023-11-27 01:03:06,830 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3654786.6666666665, ans=0.125 2023-11-27 01:03:06,888 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3654786.6666666665, ans=0.125 2023-11-27 01:03:13,934 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 7150, loss[loss=0.06765, simple_loss=0.09686, pruned_loss=0.01086, audio_tagging_loss=0.008352, over 15055.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08915, pruned_loss=0.01189, audio_tagging_loss=0.008799, over 3042755.02 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 01:03:17,098 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.61 vs. limit=15.0 2023-11-27 01:03:30,292 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3654920.0, ans=0.125 2023-11-27 01:03:33,498 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3654920.0, ans=0.0 2023-11-27 01:03:36,479 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 548250 2023-11-27 01:04:09,621 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 7200, loss[loss=0.06801, simple_loss=0.08951, pruned_loss=0.01268, audio_tagging_loss=0.01057, over 15365.00 frames. ], tot_loss[loss=0.06495, simple_loss=0.08868, pruned_loss=0.01175, audio_tagging_loss=0.008861, over 3035474.77 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 01:04:09,922 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3655186.6666666665, ans=0.0 2023-11-27 01:04:13,134 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3655186.6666666665, ans=0.125 2023-11-27 01:04:15,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3655186.6666666665, ans=0.0 2023-11-27 01:04:15,215 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3655186.6666666665, ans=0.2 2023-11-27 01:04:22,353 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.766e+01 9.112e+01 9.564e+01 1.040e+02 1.454e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-27 01:04:30,997 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 548300 2023-11-27 01:04:34,196 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3655320.0, ans=0.0 2023-11-27 01:04:36,946 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3655320.0, ans=0.125 2023-11-27 01:04:39,673 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3655320.0, ans=0.1 2023-11-27 01:04:54,625 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.77 vs. limit=15.0 2023-11-27 01:05:04,738 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 7250, loss[loss=0.07347, simple_loss=0.09864, pruned_loss=0.01542, audio_tagging_loss=0.008729, over 16133.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08958, pruned_loss=0.01192, audio_tagging_loss=0.00896, over 3039098.08 frames. ], batch size: 61, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 01:05:06,025 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3655520.0, ans=0.0 2023-11-27 01:05:25,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3655586.6666666665, ans=0.125 2023-11-27 01:05:27,656 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 548350 2023-11-27 01:05:28,706 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3655653.3333333335, ans=0.125 2023-11-27 01:05:30,541 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.31 vs. limit=5.0 2023-11-27 01:05:34,607 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3655653.3333333335, ans=0.1 2023-11-27 01:05:36,783 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3655653.3333333335, ans=0.1 2023-11-27 01:05:46,353 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3655720.0, ans=0.125 2023-11-27 01:05:47,444 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3655720.0, ans=0.09899494936611666 2023-11-27 01:05:59,833 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 7300, loss[loss=0.06234, simple_loss=0.08555, pruned_loss=0.0105, audio_tagging_loss=0.009068, over 15616.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08944, pruned_loss=0.0119, audio_tagging_loss=0.008866, over 3039969.00 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 01:06:07,311 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3655853.3333333335, ans=0.0 2023-11-27 01:06:13,229 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3655920.0, ans=0.125 2023-11-27 01:06:13,241 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3655920.0, ans=0.125 2023-11-27 01:06:14,685 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.602e+01 8.978e+01 9.664e+01 1.039e+02 1.460e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-27 01:06:21,209 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3655920.0, ans=0.0 2023-11-27 01:06:23,173 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 548400 2023-11-27 01:06:36,368 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3656053.3333333335, ans=0.125 2023-11-27 01:06:41,797 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3656053.3333333335, ans=0.125 2023-11-27 01:06:47,141 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3656120.0, ans=0.0 2023-11-27 01:06:53,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3656120.0, ans=10.0 2023-11-27 01:06:55,107 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3656120.0, ans=0.125 2023-11-27 01:06:57,550 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 7350, loss[loss=0.06433, simple_loss=0.09475, pruned_loss=0.01065, audio_tagging_loss=0.006309, over 15106.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.08932, pruned_loss=0.012, audio_tagging_loss=0.008747, over 3037800.07 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 01:06:58,838 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3656186.6666666665, ans=0.0 2023-11-27 01:07:00,822 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3656186.6666666665, ans=0.125 2023-11-27 01:07:04,049 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3656186.6666666665, ans=0.125 2023-11-27 01:07:18,785 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 548450 2023-11-27 01:07:45,327 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3656453.3333333335, ans=0.125 2023-11-27 01:07:48,730 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.29 vs. limit=12.0 2023-11-27 01:07:52,333 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 7400, loss[loss=0.06329, simple_loss=0.09137, pruned_loss=0.009794, audio_tagging_loss=0.007809, over 15611.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.08927, pruned_loss=0.01198, audio_tagging_loss=0.008599, over 3045921.10 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 01:07:52,809 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.43 vs. limit=15.0 2023-11-27 01:08:02,030 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3656586.6666666665, ans=0.1 2023-11-27 01:08:05,016 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.471e+01 8.855e+01 9.450e+01 1.015e+02 1.303e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-27 01:08:14,706 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 548500 2023-11-27 01:08:18,949 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 01:08:28,803 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3656720.0, ans=0.125 2023-11-27 01:08:38,208 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3656786.6666666665, ans=0.0 2023-11-27 01:08:39,352 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3656786.6666666665, ans=0.125 2023-11-27 01:08:42,725 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.19 vs. limit=15.0 2023-11-27 01:08:43,520 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 01:08:47,518 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 7450, loss[loss=0.06583, simple_loss=0.08703, pruned_loss=0.0113, audio_tagging_loss=0.01102, over 16533.00 frames. ], tot_loss[loss=0.0651, simple_loss=0.08894, pruned_loss=0.01198, audio_tagging_loss=0.008645, over 3053083.18 frames. ], batch size: 62, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 01:08:56,343 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3656853.3333333335, ans=0.125 2023-11-27 01:09:04,350 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3656920.0, ans=0.125 2023-11-27 01:09:10,538 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 548550 2023-11-27 01:09:11,843 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3656986.6666666665, ans=0.125 2023-11-27 01:09:15,932 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3656986.6666666665, ans=0.0 2023-11-27 01:09:19,260 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3656986.6666666665, ans=0.07 2023-11-27 01:09:25,752 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3657053.3333333335, ans=0.1 2023-11-27 01:09:29,343 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.50 vs. limit=15.0 2023-11-27 01:09:30,914 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3657120.0, ans=0.125 2023-11-27 01:09:38,227 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=3657120.0, ans=0.2 2023-11-27 01:09:41,487 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3657120.0, ans=0.125 2023-11-27 01:09:43,442 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 7500, loss[loss=0.07799, simple_loss=0.1144, pruned_loss=0.01581, audio_tagging_loss=0.005005, over 15057.00 frames. ], tot_loss[loss=0.06493, simple_loss=0.08874, pruned_loss=0.01198, audio_tagging_loss=0.008573, over 3047096.26 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 01:09:49,052 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3657186.6666666665, ans=0.2 2023-11-27 01:09:50,703 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.39 vs. limit=15.0 2023-11-27 01:09:57,378 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.455e+01 8.963e+01 9.690e+01 1.036e+02 1.410e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-27 01:09:57,696 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3657253.3333333335, ans=0.125 2023-11-27 01:10:05,781 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 548600 2023-11-27 01:10:17,353 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3657386.6666666665, ans=0.125 2023-11-27 01:10:22,662 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3657386.6666666665, ans=0.0 2023-11-27 01:10:37,346 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3657453.3333333335, ans=0.125 2023-11-27 01:10:37,526 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3657453.3333333335, ans=0.1 2023-11-27 01:10:39,353 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 7550, loss[loss=0.07051, simple_loss=0.09663, pruned_loss=0.01436, audio_tagging_loss=0.007831, over 14945.00 frames. ], tot_loss[loss=0.06495, simple_loss=0.08896, pruned_loss=0.0119, audio_tagging_loss=0.008569, over 3048704.15 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 01:10:39,655 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3657520.0, ans=0.0 2023-11-27 01:10:43,093 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.00 vs. limit=22.5 2023-11-27 01:10:59,657 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3657586.6666666665, ans=0.07 2023-11-27 01:11:01,660 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 548650 2023-11-27 01:11:27,361 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.52 vs. limit=15.0 2023-11-27 01:11:34,187 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 7600, loss[loss=0.06332, simple_loss=0.09683, pruned_loss=0.008529, audio_tagging_loss=0.006374, over 15988.00 frames. ], tot_loss[loss=0.06453, simple_loss=0.08864, pruned_loss=0.01171, audio_tagging_loss=0.008504, over 3051807.73 frames. ], batch size: 59, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 01:11:42,814 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3657853.3333333335, ans=0.2 2023-11-27 01:11:47,970 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.020e+01 8.781e+01 9.560e+01 1.034e+02 1.331e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-27 01:11:57,276 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 548700 2023-11-27 01:12:02,227 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3657986.6666666665, ans=0.0 2023-11-27 01:12:04,218 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3657986.6666666665, ans=0.0 2023-11-27 01:12:15,964 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3658053.3333333335, ans=0.125 2023-11-27 01:12:30,370 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 7650, loss[loss=0.06475, simple_loss=0.09144, pruned_loss=0.01179, audio_tagging_loss=0.007239, over 15860.00 frames. ], tot_loss[loss=0.0649, simple_loss=0.08923, pruned_loss=0.01177, audio_tagging_loss=0.008509, over 3057014.27 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 01:12:36,787 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.46 vs. limit=15.0 2023-11-27 01:12:39,710 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3658186.6666666665, ans=0.0 2023-11-27 01:12:52,769 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 548750 2023-11-27 01:13:02,583 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3658386.6666666665, ans=0.125 2023-11-27 01:13:03,185 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.01 vs. limit=8.0 2023-11-27 01:13:26,552 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 7700, loss[loss=0.09539, simple_loss=0.1333, pruned_loss=0.01975, audio_tagging_loss=0.009009, over 15775.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.09, pruned_loss=0.01178, audio_tagging_loss=0.008463, over 3061453.38 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:13:33,188 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.65 vs. limit=22.5 2023-11-27 01:13:35,212 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3658520.0, ans=0.125 2023-11-27 01:13:40,201 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.592e+01 8.982e+01 9.750e+01 1.038e+02 1.363e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-27 01:13:48,762 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 548800 2023-11-27 01:14:01,258 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3658720.0, ans=0.125 2023-11-27 01:14:07,571 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3658720.0, ans=10.0 2023-11-27 01:14:21,517 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 7750, loss[loss=0.06048, simple_loss=0.08114, pruned_loss=0.01147, audio_tagging_loss=0.008436, over 15015.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.08953, pruned_loss=0.01185, audio_tagging_loss=0.008548, over 3067762.24 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:14:25,852 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3658853.3333333335, ans=0.2 2023-11-27 01:14:39,579 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3658920.0, ans=0.025 2023-11-27 01:14:44,213 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 548850 2023-11-27 01:14:56,519 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3659053.3333333335, ans=0.0 2023-11-27 01:15:02,801 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3659053.3333333335, ans=0.0 2023-11-27 01:15:05,054 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.78 vs. limit=15.0 2023-11-27 01:15:17,384 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 7800, loss[loss=0.07227, simple_loss=0.1073, pruned_loss=0.0115, audio_tagging_loss=0.0071, over 15062.00 frames. ], tot_loss[loss=0.06511, simple_loss=0.08915, pruned_loss=0.01192, audio_tagging_loss=0.008624, over 3058142.69 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:15:19,690 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3659186.6666666665, ans=0.125 2023-11-27 01:15:31,598 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.098e+01 9.034e+01 9.648e+01 1.056e+02 1.237e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-27 01:15:32,188 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.55 vs. limit=22.5 2023-11-27 01:15:39,507 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 548900 2023-11-27 01:15:42,765 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3659320.0, ans=0.1 2023-11-27 01:16:05,709 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.25 vs. limit=15.0 2023-11-27 01:16:12,953 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 7850, loss[loss=0.06218, simple_loss=0.08414, pruned_loss=0.01265, audio_tagging_loss=0.007462, over 14673.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08965, pruned_loss=0.01212, audio_tagging_loss=0.008722, over 3057245.54 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:16:20,177 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3659520.0, ans=0.0 2023-11-27 01:16:35,352 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 548950 2023-11-27 01:16:43,588 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3659653.3333333335, ans=0.0 2023-11-27 01:16:56,699 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3659786.6666666665, ans=0.0 2023-11-27 01:17:04,986 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.14 vs. limit=15.0 2023-11-27 01:17:05,104 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.45 vs. limit=6.0 2023-11-27 01:17:06,257 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.44 vs. limit=15.0 2023-11-27 01:17:08,646 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 7900, loss[loss=0.06847, simple_loss=0.09614, pruned_loss=0.01116, audio_tagging_loss=0.009236, over 15753.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08904, pruned_loss=0.01215, audio_tagging_loss=0.008763, over 3056634.60 frames. ], batch size: 59, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:17:11,315 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.29 vs. limit=15.0 2023-11-27 01:17:23,298 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.697e+01 9.289e+01 9.929e+01 1.057e+02 1.408e+02, threshold=1.986e+02, percent-clipped=0.0 2023-11-27 01:17:31,358 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 549000 2023-11-27 01:17:36,027 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3659986.6666666665, ans=0.0 2023-11-27 01:17:47,471 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.88 vs. limit=15.0 2023-11-27 01:17:52,436 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3660120.0, ans=0.125 2023-11-27 01:17:56,564 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3660120.0, ans=0.2 2023-11-27 01:18:04,845 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 7950, loss[loss=0.05258, simple_loss=0.06536, pruned_loss=0.0119, audio_tagging_loss=0.007995, over 13927.00 frames. ], tot_loss[loss=0.06522, simple_loss=0.08844, pruned_loss=0.01207, audio_tagging_loss=0.008923, over 3053808.47 frames. ], batch size: 53, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:18:09,385 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3660186.6666666665, ans=0.0 2023-11-27 01:18:18,140 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 01:18:18,343 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 01:18:26,833 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 549050 2023-11-27 01:18:29,035 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3660320.0, ans=0.125 2023-11-27 01:18:32,667 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3660320.0, ans=0.0 2023-11-27 01:18:32,733 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 01:18:44,540 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3660386.6666666665, ans=0.1 2023-11-27 01:19:00,889 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 8000, loss[loss=0.04488, simple_loss=0.05844, pruned_loss=0.00415, audio_tagging_loss=0.01151, over 15375.00 frames. ], tot_loss[loss=0.06465, simple_loss=0.0877, pruned_loss=0.01183, audio_tagging_loss=0.008975, over 3047118.71 frames. ], batch size: 59, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 01:19:14,458 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.336e+01 9.017e+01 9.575e+01 1.027e+02 1.291e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-27 01:19:15,645 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.13 vs. limit=15.0 2023-11-27 01:19:15,712 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.35 vs. limit=22.5 2023-11-27 01:19:16,373 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3660586.6666666665, ans=0.2 2023-11-27 01:19:22,510 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 549100 2023-11-27 01:19:27,943 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3660653.3333333335, ans=0.05 2023-11-27 01:19:37,493 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3660720.0, ans=0.125 2023-11-27 01:19:54,837 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3660853.3333333335, ans=0.125 2023-11-27 01:19:55,674 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 8050, loss[loss=0.07532, simple_loss=0.1045, pruned_loss=0.01557, audio_tagging_loss=0.007497, over 15977.00 frames. ], tot_loss[loss=0.0648, simple_loss=0.08785, pruned_loss=0.01188, audio_tagging_loss=0.009002, over 3047451.69 frames. ], batch size: 61, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 01:20:00,171 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=14.17 vs. limit=15.0 2023-11-27 01:20:06,422 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3660920.0, ans=0.125 2023-11-27 01:20:09,190 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3660920.0, ans=0.1 2023-11-27 01:20:18,373 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 549150 2023-11-27 01:20:49,482 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3661120.0, ans=0.0 2023-11-27 01:20:51,392 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.57 vs. limit=6.0 2023-11-27 01:20:51,912 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 8100, loss[loss=0.0652, simple_loss=0.09645, pruned_loss=0.01015, audio_tagging_loss=0.006827, over 14068.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08861, pruned_loss=0.01204, audio_tagging_loss=0.008897, over 3044623.51 frames. ], batch size: 54, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:21:07,269 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.755e+01 8.808e+01 9.534e+01 1.042e+02 1.593e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-27 01:21:10,718 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3661253.3333333335, ans=0.1 2023-11-27 01:21:13,728 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 549200 2023-11-27 01:21:28,473 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3661386.6666666665, ans=0.05 2023-11-27 01:21:47,912 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 8150, loss[loss=0.07738, simple_loss=0.1113, pruned_loss=0.01547, audio_tagging_loss=0.006278, over 15788.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.08934, pruned_loss=0.01216, audio_tagging_loss=0.008683, over 3048055.87 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:21:50,206 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3661520.0, ans=0.95 2023-11-27 01:22:02,904 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3661586.6666666665, ans=0.125 2023-11-27 01:22:09,106 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 549250 2023-11-27 01:22:09,922 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3661653.3333333335, ans=0.125 2023-11-27 01:22:14,684 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3661653.3333333335, ans=0.1 2023-11-27 01:22:18,639 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.12 vs. limit=15.0 2023-11-27 01:22:41,932 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 01:22:42,947 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 8200, loss[loss=0.06533, simple_loss=0.08351, pruned_loss=0.01302, audio_tagging_loss=0.01056, over 15233.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08954, pruned_loss=0.01215, audio_tagging_loss=0.00855, over 3049230.25 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:22:54,333 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3661920.0, ans=0.125 2023-11-27 01:22:55,220 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3661920.0, ans=0.125 2023-11-27 01:22:58,789 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.772e+01 8.840e+01 9.434e+01 1.030e+02 1.387e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-27 01:23:05,232 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 549300 2023-11-27 01:23:13,444 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3661986.6666666665, ans=0.0 2023-11-27 01:23:20,737 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3662053.3333333335, ans=0.0 2023-11-27 01:23:30,227 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3662120.0, ans=0.125 2023-11-27 01:23:38,474 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 8250, loss[loss=0.06651, simple_loss=0.08705, pruned_loss=0.01342, audio_tagging_loss=0.009564, over 16028.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08898, pruned_loss=0.0121, audio_tagging_loss=0.008547, over 3048575.59 frames. ], batch size: 59, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:23:50,073 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.23 vs. limit=10.0 2023-11-27 01:24:00,864 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 549350 2023-11-27 01:24:12,648 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3662386.6666666665, ans=0.125 2023-11-27 01:24:34,719 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 8300, loss[loss=0.08226, simple_loss=0.1064, pruned_loss=0.02049, audio_tagging_loss=0.008578, over 15181.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.08884, pruned_loss=0.01208, audio_tagging_loss=0.008656, over 3049296.52 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:24:47,729 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3662586.6666666665, ans=0.125 2023-11-27 01:24:48,945 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.24 vs. limit=15.0 2023-11-27 01:24:49,550 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.731e+01 9.008e+01 9.718e+01 1.064e+02 1.333e+02, threshold=1.944e+02, percent-clipped=0.0 2023-11-27 01:24:49,874 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3662586.6666666665, ans=0.0 2023-11-27 01:24:54,438 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=3662586.6666666665, ans=15.0 2023-11-27 01:24:56,117 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 549400 2023-11-27 01:25:19,222 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3662786.6666666665, ans=0.0 2023-11-27 01:25:29,695 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 8350, loss[loss=0.06762, simple_loss=0.09155, pruned_loss=0.01385, audio_tagging_loss=0.007997, over 15710.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08895, pruned_loss=0.01209, audio_tagging_loss=0.008577, over 3049187.77 frames. ], batch size: 60, lr: 1.46e-03, grad_scale: 8.0 2023-11-27 01:25:32,240 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.84 vs. limit=15.0 2023-11-27 01:25:44,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3662920.0, ans=0.125 2023-11-27 01:25:52,446 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 549450 2023-11-27 01:26:24,624 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 8400, loss[loss=0.08969, simple_loss=0.129, pruned_loss=0.01824, audio_tagging_loss=0.006938, over 16059.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.08873, pruned_loss=0.01215, audio_tagging_loss=0.008567, over 3050818.78 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:26:42,231 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.72 vs. limit=22.5 2023-11-27 01:26:42,698 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.425e+01 8.598e+01 9.317e+01 1.002e+02 1.221e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-27 01:26:46,103 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3663253.3333333335, ans=10.0 2023-11-27 01:26:48,139 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 549500 2023-11-27 01:26:53,671 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3663320.0, ans=0.2 2023-11-27 01:27:17,354 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3663453.3333333335, ans=0.125 2023-11-27 01:27:18,785 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.45 vs. limit=6.0 2023-11-27 01:27:21,426 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 8450, loss[loss=0.06899, simple_loss=0.09759, pruned_loss=0.009463, audio_tagging_loss=0.01073, over 15177.00 frames. ], tot_loss[loss=0.06504, simple_loss=0.08879, pruned_loss=0.01204, audio_tagging_loss=0.008605, over 3044735.30 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:27:42,139 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3663653.3333333335, ans=0.125 2023-11-27 01:27:42,505 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.65 vs. limit=15.0 2023-11-27 01:27:43,113 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 549550 2023-11-27 01:27:48,489 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3663653.3333333335, ans=0.125 2023-11-27 01:28:01,593 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3663720.0, ans=0.0 2023-11-27 01:28:07,719 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=3663786.6666666665, ans=15.0 2023-11-27 01:28:14,164 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.54 vs. limit=15.0 2023-11-27 01:28:16,775 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 8500, loss[loss=0.04118, simple_loss=0.04752, pruned_loss=0.006062, audio_tagging_loss=0.01136, over 15805.00 frames. ], tot_loss[loss=0.06473, simple_loss=0.08837, pruned_loss=0.01198, audio_tagging_loss=0.008574, over 3041682.38 frames. ], batch size: 60, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:28:32,910 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.581e+01 8.917e+01 9.803e+01 1.059e+02 2.470e+02, threshold=1.961e+02, percent-clipped=1.0 2023-11-27 01:28:38,846 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 549600 2023-11-27 01:28:58,075 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=3664053.3333333335, ans=0.05 2023-11-27 01:29:02,670 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.70 vs. limit=6.0 2023-11-27 01:29:04,483 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3664120.0, ans=0.125 2023-11-27 01:29:10,730 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3664186.6666666665, ans=0.0 2023-11-27 01:29:10,808 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3664186.6666666665, ans=0.125 2023-11-27 01:29:11,615 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 8550, loss[loss=0.06743, simple_loss=0.08824, pruned_loss=0.01463, audio_tagging_loss=0.008678, over 14532.00 frames. ], tot_loss[loss=0.06471, simple_loss=0.08822, pruned_loss=0.01197, audio_tagging_loss=0.008627, over 3041019.69 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:29:16,573 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3664186.6666666665, ans=0.125 2023-11-27 01:29:29,987 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3664253.3333333335, ans=0.125 2023-11-27 01:29:32,030 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3664253.3333333335, ans=0.0 2023-11-27 01:29:35,143 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 549650 2023-11-27 01:29:40,471 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3664320.0, ans=0.2 2023-11-27 01:29:46,269 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.11 vs. limit=15.0 2023-11-27 01:29:57,960 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3664453.3333333335, ans=0.125 2023-11-27 01:30:07,884 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 8600, loss[loss=0.07091, simple_loss=0.09943, pruned_loss=0.01229, audio_tagging_loss=0.008903, over 13662.00 frames. ], tot_loss[loss=0.06467, simple_loss=0.08837, pruned_loss=0.01183, audio_tagging_loss=0.008655, over 3045549.65 frames. ], batch size: 54, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:30:12,856 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3664520.0, ans=0.05 2023-11-27 01:30:16,008 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3664520.0, ans=0.1 2023-11-27 01:30:18,191 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3664586.6666666665, ans=0.0 2023-11-27 01:30:24,200 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.616e+01 8.820e+01 9.467e+01 9.988e+01 1.186e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-27 01:30:29,571 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 549700 2023-11-27 01:30:31,871 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3664653.3333333335, ans=0.07 2023-11-27 01:30:33,859 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3664653.3333333335, ans=0.125 2023-11-27 01:30:44,980 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.53 vs. limit=22.5 2023-11-27 01:30:45,702 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3664720.0, ans=0.125 2023-11-27 01:30:55,390 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3664786.6666666665, ans=0.1 2023-11-27 01:31:03,576 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 8650, loss[loss=0.04338, simple_loss=0.05637, pruned_loss=0.00613, audio_tagging_loss=0.00907, over 15907.00 frames. ], tot_loss[loss=0.06463, simple_loss=0.08829, pruned_loss=0.01177, audio_tagging_loss=0.008707, over 3048277.00 frames. ], batch size: 63, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:31:10,016 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3664853.3333333335, ans=0.125 2023-11-27 01:31:19,831 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3664920.0, ans=0.125 2023-11-27 01:31:22,433 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3664920.0, ans=0.125 2023-11-27 01:31:26,026 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 549750 2023-11-27 01:31:34,621 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3664986.6666666665, ans=0.125 2023-11-27 01:31:37,329 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3665053.3333333335, ans=0.95 2023-11-27 01:31:50,517 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3665120.0, ans=0.05 2023-11-27 01:31:58,571 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 8700, loss[loss=0.06439, simple_loss=0.09009, pruned_loss=0.01179, audio_tagging_loss=0.007558, over 15573.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.0894, pruned_loss=0.01204, audio_tagging_loss=0.00873, over 3050946.36 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:31:59,784 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3665186.6666666665, ans=0.125 2023-11-27 01:32:08,167 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3665186.6666666665, ans=0.025 2023-11-27 01:32:08,273 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 01:32:15,442 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.944e+01 9.069e+01 9.762e+01 1.053e+02 1.470e+02, threshold=1.952e+02, percent-clipped=0.0 2023-11-27 01:32:18,467 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3665253.3333333335, ans=0.09899494936611666 2023-11-27 01:32:19,546 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3665253.3333333335, ans=0.125 2023-11-27 01:32:20,590 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3665320.0, ans=0.2 2023-11-27 01:32:20,670 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3665320.0, ans=0.1 2023-11-27 01:32:21,558 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 549800 2023-11-27 01:32:24,688 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3665320.0, ans=0.0 2023-11-27 01:32:26,087 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.60 vs. limit=15.0 2023-11-27 01:32:43,794 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3665453.3333333335, ans=0.125 2023-11-27 01:32:55,288 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 8750, loss[loss=0.06249, simple_loss=0.08981, pruned_loss=0.009931, audio_tagging_loss=0.007653, over 14644.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08935, pruned_loss=0.01215, audio_tagging_loss=0.008768, over 3055696.87 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:33:17,399 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 549850 2023-11-27 01:33:26,172 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.72 vs. limit=12.0 2023-11-27 01:33:36,423 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3665720.0, ans=0.035 2023-11-27 01:33:45,974 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.39 vs. limit=15.0 2023-11-27 01:33:50,722 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 8800, loss[loss=0.06556, simple_loss=0.08892, pruned_loss=0.01368, audio_tagging_loss=0.007431, over 15590.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.09031, pruned_loss=0.01233, audio_tagging_loss=0.008783, over 3052660.59 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:33:51,855 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.52 vs. limit=6.0 2023-11-27 01:34:01,057 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3665920.0, ans=0.0 2023-11-27 01:34:01,102 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3665920.0, ans=0.5 2023-11-27 01:34:02,762 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.20 vs. limit=5.0 2023-11-27 01:34:08,226 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.927e+01 8.987e+01 9.532e+01 1.025e+02 1.979e+02, threshold=1.906e+02, percent-clipped=1.0 2023-11-27 01:34:13,063 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 549900 2023-11-27 01:34:27,091 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3666053.3333333335, ans=0.125 2023-11-27 01:34:31,225 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3666053.3333333335, ans=0.125 2023-11-27 01:34:31,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3666053.3333333335, ans=0.2 2023-11-27 01:34:46,290 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 8850, loss[loss=0.05859, simple_loss=0.08251, pruned_loss=0.008488, audio_tagging_loss=0.008849, over 15104.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.09035, pruned_loss=0.01215, audio_tagging_loss=0.008872, over 3056359.87 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:34:49,716 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3666186.6666666665, ans=0.0 2023-11-27 01:34:54,886 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.52 vs. limit=15.0 2023-11-27 01:34:55,342 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 01:35:09,300 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 549950 2023-11-27 01:35:33,359 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3666453.3333333335, ans=0.0 2023-11-27 01:35:37,043 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3666453.3333333335, ans=0.1 2023-11-27 01:35:39,725 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3666453.3333333335, ans=0.1 2023-11-27 01:35:42,759 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 8900, loss[loss=0.06461, simple_loss=0.09319, pruned_loss=0.01113, audio_tagging_loss=0.00688, over 14777.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.09031, pruned_loss=0.01207, audio_tagging_loss=0.008804, over 3051165.67 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:35:45,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3666520.0, ans=0.125 2023-11-27 01:35:51,957 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3666520.0, ans=0.1 2023-11-27 01:35:58,639 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.73 vs. limit=12.0 2023-11-27 01:35:59,493 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3666586.6666666665, ans=0.1 2023-11-27 01:36:00,346 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.867e+01 8.952e+01 9.534e+01 1.026e+02 1.525e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-27 01:36:01,634 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3666586.6666666665, ans=0.125 2023-11-27 01:36:05,257 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 550000 2023-11-27 01:36:05,262 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3666653.3333333335, ans=0.125 2023-11-27 01:36:06,807 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3666653.3333333335, ans=0.2 2023-11-27 01:36:26,809 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3666786.6666666665, ans=0.125 2023-11-27 01:36:28,920 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3666786.6666666665, ans=0.0 2023-11-27 01:36:34,829 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3666786.6666666665, ans=0.0 2023-11-27 01:36:37,224 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.44 vs. limit=15.0 2023-11-27 01:36:38,842 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 8950, loss[loss=0.06659, simple_loss=0.09315, pruned_loss=0.01281, audio_tagging_loss=0.007203, over 15513.00 frames. ], tot_loss[loss=0.06511, simple_loss=0.08927, pruned_loss=0.01176, audio_tagging_loss=0.00872, over 3053745.24 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 8.0 2023-11-27 01:36:39,041 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3666853.3333333335, ans=0.125 2023-11-27 01:36:44,542 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.53 vs. limit=15.0 2023-11-27 01:36:48,087 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3666853.3333333335, ans=0.2 2023-11-27 01:36:53,382 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=3666920.0, ans=10.0 2023-11-27 01:36:54,563 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.43 vs. limit=15.0 2023-11-27 01:36:56,371 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 01:37:00,462 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 550050 2023-11-27 01:37:32,711 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.13 vs. limit=15.0 2023-11-27 01:37:34,274 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 9000, loss[loss=0.05072, simple_loss=0.07226, pruned_loss=0.006655, audio_tagging_loss=0.007938, over 14789.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08932, pruned_loss=0.01192, audio_tagging_loss=0.008718, over 3051361.20 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 8.0 2023-11-27 01:37:34,276 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-27 01:38:07,106 INFO [train_asr.py:1267] (0/4) Epoch 46, validation: loss=0.05879, simple_loss=0.05049, pruned_loss=0.005306, audio_tagging_loss=0.02824, over 4681554.00 frames. 2023-11-27 01:38:07,107 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-27 01:38:22,566 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3667253.3333333335, ans=0.2 2023-11-27 01:38:25,417 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.363e+01 8.928e+01 9.533e+01 1.025e+02 1.320e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-27 01:38:29,411 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 550100 2023-11-27 01:38:30,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3667320.0, ans=0.125 2023-11-27 01:38:49,709 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3667386.6666666665, ans=0.125 2023-11-27 01:38:57,718 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3667453.3333333335, ans=0.0 2023-11-27 01:39:02,707 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 9050, loss[loss=0.04408, simple_loss=0.05939, pruned_loss=0.005939, audio_tagging_loss=0.008444, over 18054.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08962, pruned_loss=0.01198, audio_tagging_loss=0.008572, over 3059769.72 frames. ], batch size: 70, lr: 1.46e-03, grad_scale: 4.0 2023-11-27 01:39:09,383 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.95 vs. limit=10.0 2023-11-27 01:39:12,229 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.26 vs. limit=15.0 2023-11-27 01:39:18,323 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3667586.6666666665, ans=0.2 2023-11-27 01:39:21,528 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3667586.6666666665, ans=0.125 2023-11-27 01:39:24,579 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 550150 2023-11-27 01:39:28,735 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.87 vs. limit=15.0 2023-11-27 01:39:58,464 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 9100, loss[loss=0.07181, simple_loss=0.09345, pruned_loss=0.01288, audio_tagging_loss=0.0122, over 15326.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.08947, pruned_loss=0.01187, audio_tagging_loss=0.008554, over 3058302.73 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 8.0 2023-11-27 01:40:01,862 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3667853.3333333335, ans=0.125 2023-11-27 01:40:12,833 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.67 vs. limit=6.0 2023-11-27 01:40:13,733 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3667920.0, ans=0.1 2023-11-27 01:40:19,179 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.684e+01 9.136e+01 9.567e+01 1.016e+02 1.322e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-27 01:40:21,402 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 550200 2023-11-27 01:40:34,418 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3668053.3333333335, ans=0.125 2023-11-27 01:40:40,368 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3668053.3333333335, ans=0.125 2023-11-27 01:40:49,701 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 01:40:50,002 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.09 vs. limit=6.0 2023-11-27 01:40:54,729 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 9150, loss[loss=0.06568, simple_loss=0.09807, pruned_loss=0.009912, audio_tagging_loss=0.006737, over 14479.00 frames. ], tot_loss[loss=0.06499, simple_loss=0.08946, pruned_loss=0.01181, audio_tagging_loss=0.008453, over 3051714.45 frames. ], batch size: 53, lr: 1.46e-03, grad_scale: 8.0 2023-11-27 01:41:09,778 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.91 vs. limit=22.5 2023-11-27 01:41:16,509 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 550250 2023-11-27 01:41:27,725 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3668386.6666666665, ans=0.2 2023-11-27 01:41:40,621 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3668453.3333333335, ans=0.1 2023-11-27 01:41:41,615 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3668453.3333333335, ans=0.2 2023-11-27 01:41:46,262 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3668453.3333333335, ans=0.95 2023-11-27 01:41:47,287 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3668453.3333333335, ans=0.125 2023-11-27 01:41:50,293 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 9200, loss[loss=0.06373, simple_loss=0.08604, pruned_loss=0.01347, audio_tagging_loss=0.007236, over 15032.00 frames. ], tot_loss[loss=0.06522, simple_loss=0.08976, pruned_loss=0.0119, audio_tagging_loss=0.008439, over 3049097.90 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:42:06,404 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3668586.6666666665, ans=0.0 2023-11-27 01:42:09,837 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.485e+01 8.971e+01 9.683e+01 1.056e+02 2.334e+02, threshold=1.937e+02, percent-clipped=1.0 2023-11-27 01:42:12,048 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 550300 2023-11-27 01:42:14,320 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3668653.3333333335, ans=0.125 2023-11-27 01:42:15,345 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 01:42:45,342 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 9250, loss[loss=0.05656, simple_loss=0.07994, pruned_loss=0.008544, audio_tagging_loss=0.008047, over 15369.00 frames. ], tot_loss[loss=0.06484, simple_loss=0.08906, pruned_loss=0.0118, audio_tagging_loss=0.008514, over 3051365.40 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:42:49,695 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3668853.3333333335, ans=0.125 2023-11-27 01:43:04,269 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3668920.0, ans=0.1 2023-11-27 01:43:04,659 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.12 vs. limit=15.0 2023-11-27 01:43:08,929 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 550350 2023-11-27 01:43:29,684 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.67 vs. limit=15.0 2023-11-27 01:43:35,234 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3669120.0, ans=0.125 2023-11-27 01:43:41,739 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 9300, loss[loss=0.07894, simple_loss=0.1083, pruned_loss=0.01664, audio_tagging_loss=0.008156, over 14053.00 frames. ], tot_loss[loss=0.06503, simple_loss=0.08898, pruned_loss=0.01188, audio_tagging_loss=0.008662, over 3047605.97 frames. ], batch size: 52, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:44:01,829 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.974e+01 8.933e+01 9.435e+01 1.011e+02 1.310e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-27 01:44:04,052 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 550400 2023-11-27 01:44:11,879 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3669320.0, ans=0.2 2023-11-27 01:44:17,132 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3669386.6666666665, ans=0.125 2023-11-27 01:44:35,147 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3669453.3333333335, ans=0.2 2023-11-27 01:44:36,129 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 01:44:37,252 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3669520.0, ans=10.0 2023-11-27 01:44:38,041 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 9350, loss[loss=0.0821, simple_loss=0.1149, pruned_loss=0.01551, audio_tagging_loss=0.009145, over 16679.00 frames. ], tot_loss[loss=0.06477, simple_loss=0.08843, pruned_loss=0.01185, audio_tagging_loss=0.008709, over 3045701.34 frames. ], batch size: 66, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:44:59,449 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 550450 2023-11-27 01:45:02,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3669653.3333333335, ans=0.125 2023-11-27 01:45:03,499 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3669653.3333333335, ans=0.125 2023-11-27 01:45:06,020 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3669653.3333333335, ans=0.125 2023-11-27 01:45:14,988 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3669720.0, ans=0.125 2023-11-27 01:45:24,007 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3669786.6666666665, ans=0.0 2023-11-27 01:45:27,125 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3669786.6666666665, ans=0.0 2023-11-27 01:45:33,202 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 9400, loss[loss=0.08037, simple_loss=0.1046, pruned_loss=0.01821, audio_tagging_loss=0.009864, over 14722.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08944, pruned_loss=0.01202, audio_tagging_loss=0.008685, over 3051334.76 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:45:50,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3669920.0, ans=0.125 2023-11-27 01:45:54,110 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.770e+01 8.852e+01 9.637e+01 1.052e+02 1.350e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-27 01:45:56,319 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 550500 2023-11-27 01:46:10,668 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3670053.3333333335, ans=0.125 2023-11-27 01:46:25,297 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 01:46:29,037 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 9450, loss[loss=0.07127, simple_loss=0.09562, pruned_loss=0.01548, audio_tagging_loss=0.007972, over 14171.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08937, pruned_loss=0.012, audio_tagging_loss=0.008834, over 3049295.55 frames. ], batch size: 53, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:46:31,878 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3670186.6666666665, ans=0.125 2023-11-27 01:46:42,580 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3670253.3333333335, ans=0.125 2023-11-27 01:46:49,953 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3670253.3333333335, ans=0.1 2023-11-27 01:46:51,969 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 550550 2023-11-27 01:47:20,081 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3670453.3333333335, ans=0.2 2023-11-27 01:47:22,301 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3670453.3333333335, ans=0.0 2023-11-27 01:47:25,296 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.35 vs. limit=10.0 2023-11-27 01:47:25,717 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 9500, loss[loss=0.06696, simple_loss=0.09135, pruned_loss=0.01304, audio_tagging_loss=0.008242, over 15197.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.09021, pruned_loss=0.01217, audio_tagging_loss=0.00882, over 3051191.48 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:47:32,603 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.93 vs. limit=22.5 2023-11-27 01:47:37,587 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3670586.6666666665, ans=0.125 2023-11-27 01:47:38,653 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3670586.6666666665, ans=0.125 2023-11-27 01:47:38,672 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3670586.6666666665, ans=0.125 2023-11-27 01:47:43,032 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=3670586.6666666665, ans=22.5 2023-11-27 01:47:44,654 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.655e+01 9.026e+01 9.482e+01 1.013e+02 1.263e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-27 01:47:46,837 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 550600 2023-11-27 01:48:02,634 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.19 vs. limit=15.0 2023-11-27 01:48:04,352 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3670720.0, ans=0.0 2023-11-27 01:48:13,568 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3670786.6666666665, ans=0.125 2023-11-27 01:48:18,772 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3670786.6666666665, ans=0.0 2023-11-27 01:48:20,639 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 9550, loss[loss=0.04187, simple_loss=0.05296, pruned_loss=0.006189, audio_tagging_loss=0.0092, over 13865.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.09067, pruned_loss=0.01219, audio_tagging_loss=0.008769, over 3050810.04 frames. ], batch size: 54, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:48:26,079 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3670853.3333333335, ans=0.2 2023-11-27 01:48:28,530 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.81 vs. limit=22.5 2023-11-27 01:48:33,577 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3670920.0, ans=0.0 2023-11-27 01:48:43,572 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 550650 2023-11-27 01:48:55,945 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3671053.3333333335, ans=0.2 2023-11-27 01:48:57,202 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3671053.3333333335, ans=0.125 2023-11-27 01:49:07,882 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.14 vs. limit=6.0 2023-11-27 01:49:08,617 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3671120.0, ans=0.125 2023-11-27 01:49:15,927 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 9600, loss[loss=0.06817, simple_loss=0.08715, pruned_loss=0.01285, audio_tagging_loss=0.01175, over 15958.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.09059, pruned_loss=0.01212, audio_tagging_loss=0.008842, over 3051780.80 frames. ], batch size: 61, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 01:49:37,137 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.585e+01 8.787e+01 9.468e+01 1.030e+02 1.227e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-27 01:49:39,340 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 550700 2023-11-27 01:50:12,745 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 9650, loss[loss=0.07897, simple_loss=0.114, pruned_loss=0.01405, audio_tagging_loss=0.007897, over 15719.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.09039, pruned_loss=0.01207, audio_tagging_loss=0.00879, over 3045939.44 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 01:50:34,450 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 550750 2023-11-27 01:50:34,574 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3671653.3333333335, ans=0.125 2023-11-27 01:51:05,274 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3671786.6666666665, ans=0.0 2023-11-27 01:51:08,165 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 9700, loss[loss=0.05777, simple_loss=0.08546, pruned_loss=0.006761, audio_tagging_loss=0.008282, over 15018.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.09114, pruned_loss=0.01229, audio_tagging_loss=0.008574, over 3046640.67 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:51:11,128 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.07 vs. limit=10.0 2023-11-27 01:51:16,837 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3671853.3333333335, ans=0.2 2023-11-27 01:51:28,964 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.432e+01 8.991e+01 9.696e+01 1.056e+02 1.366e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-27 01:51:30,118 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 550800 2023-11-27 01:51:54,498 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.82 vs. limit=6.0 2023-11-27 01:52:03,768 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 9750, loss[loss=0.08269, simple_loss=0.1095, pruned_loss=0.02053, audio_tagging_loss=0.007413, over 15416.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.0902, pruned_loss=0.01205, audio_tagging_loss=0.00856, over 3044338.58 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:52:16,623 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3672253.3333333335, ans=0.125 2023-11-27 01:52:17,042 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.66 vs. limit=12.0 2023-11-27 01:52:18,914 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3672253.3333333335, ans=0.125 2023-11-27 01:52:27,141 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 550850 2023-11-27 01:52:29,333 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3672320.0, ans=0.1 2023-11-27 01:52:59,828 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 9800, loss[loss=0.06879, simple_loss=0.09549, pruned_loss=0.01528, audio_tagging_loss=0.005758, over 14544.00 frames. ], tot_loss[loss=0.0647, simple_loss=0.0888, pruned_loss=0.01181, audio_tagging_loss=0.008491, over 3047335.69 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:53:20,920 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.534e+01 8.968e+01 9.826e+01 1.047e+02 1.265e+02, threshold=1.965e+02, percent-clipped=0.0 2023-11-27 01:53:22,067 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 550900 2023-11-27 01:53:25,991 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.22 vs. limit=22.5 2023-11-27 01:53:43,329 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff2.min_abs, batch_count=3672786.6666666665, ans=0.1 2023-11-27 01:53:44,394 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3672786.6666666665, ans=0.07 2023-11-27 01:53:47,919 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 01:53:51,789 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3672786.6666666665, ans=0.1 2023-11-27 01:53:53,994 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.31 vs. limit=22.5 2023-11-27 01:53:55,689 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 9850, loss[loss=0.07964, simple_loss=0.1072, pruned_loss=0.01627, audio_tagging_loss=0.009783, over 14893.00 frames. ], tot_loss[loss=0.06481, simple_loss=0.0889, pruned_loss=0.0118, audio_tagging_loss=0.008566, over 3046644.29 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 8.0 2023-11-27 01:54:01,322 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3672853.3333333335, ans=0.1 2023-11-27 01:54:17,494 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 550950 2023-11-27 01:54:19,728 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3672986.6666666665, ans=0.0 2023-11-27 01:54:21,550 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.08 vs. limit=15.0 2023-11-27 01:54:45,841 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.88 vs. limit=22.5 2023-11-27 01:54:50,666 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 9900, loss[loss=0.04521, simple_loss=0.06222, pruned_loss=0.005753, audio_tagging_loss=0.008346, over 14899.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.08915, pruned_loss=0.01179, audio_tagging_loss=0.008546, over 3043702.18 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 8.0 2023-11-27 01:54:50,942 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3673186.6666666665, ans=0.125 2023-11-27 01:54:56,321 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3673186.6666666665, ans=10.0 2023-11-27 01:55:08,083 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 01:55:13,595 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.560e+01 8.996e+01 9.617e+01 1.030e+02 1.836e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-27 01:55:13,727 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 551000 2023-11-27 01:55:22,189 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3673320.0, ans=0.125 2023-11-27 01:55:47,251 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 9950, loss[loss=0.07945, simple_loss=0.1194, pruned_loss=0.01392, audio_tagging_loss=0.005815, over 17107.00 frames. ], tot_loss[loss=0.06471, simple_loss=0.08873, pruned_loss=0.01181, audio_tagging_loss=0.008542, over 3047265.92 frames. ], batch size: 63, lr: 1.46e-03, grad_scale: 8.0 2023-11-27 01:55:57,397 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3673586.6666666665, ans=0.1 2023-11-27 01:56:09,643 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 551050 2023-11-27 01:56:33,143 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3673786.6666666665, ans=0.0 2023-11-27 01:56:42,971 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 10000, loss[loss=0.08028, simple_loss=0.1178, pruned_loss=0.01547, audio_tagging_loss=0.005911, over 15021.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.08873, pruned_loss=0.01191, audio_tagging_loss=0.008634, over 3044181.97 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:56:46,832 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3673853.3333333335, ans=0.015 2023-11-27 01:56:59,776 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3673920.0, ans=0.125 2023-11-27 01:57:05,440 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.059e+01 8.920e+01 9.463e+01 1.026e+02 1.255e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-27 01:57:05,540 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 551100 2023-11-27 01:57:14,608 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3673986.6666666665, ans=0.125 2023-11-27 01:57:17,829 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3674053.3333333335, ans=0.125 2023-11-27 01:57:21,649 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3674053.3333333335, ans=0.2 2023-11-27 01:57:32,680 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3674120.0, ans=0.125 2023-11-27 01:57:38,059 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.09 vs. limit=15.0 2023-11-27 01:57:38,694 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 10050, loss[loss=0.06577, simple_loss=0.07909, pruned_loss=0.01242, audio_tagging_loss=0.01381, over 15073.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08983, pruned_loss=0.01195, audio_tagging_loss=0.008524, over 3049643.31 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:58:01,639 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 551150 2023-11-27 01:58:11,918 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3674386.6666666665, ans=0.125 2023-11-27 01:58:34,212 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 10100, loss[loss=0.06399, simple_loss=0.09114, pruned_loss=0.008293, audio_tagging_loss=0.01013, over 15045.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.09034, pruned_loss=0.01201, audio_tagging_loss=0.008569, over 3053473.60 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:58:36,778 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.10 vs. limit=6.0 2023-11-27 01:58:39,276 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3674520.0, ans=0.1 2023-11-27 01:58:43,994 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3674520.0, ans=0.2 2023-11-27 01:58:57,103 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.829e+01 8.911e+01 9.483e+01 1.012e+02 1.276e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-27 01:58:57,206 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 551200 2023-11-27 01:59:01,263 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.40 vs. limit=6.0 2023-11-27 01:59:14,059 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3674720.0, ans=10.0 2023-11-27 01:59:18,100 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 01:59:23,027 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3674786.6666666665, ans=0.125 2023-11-27 01:59:30,770 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 10150, loss[loss=0.05088, simple_loss=0.06618, pruned_loss=0.00829, audio_tagging_loss=0.0095, over 15521.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.09013, pruned_loss=0.01202, audio_tagging_loss=0.00865, over 3054723.95 frames. ], batch size: 61, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:59:42,406 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3674920.0, ans=0.0 2023-11-27 01:59:52,886 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 551250 2023-11-27 01:59:55,506 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 02:00:09,582 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3675053.3333333335, ans=0.0 2023-11-27 02:00:24,962 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3675120.0, ans=0.2 2023-11-27 02:00:26,865 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 10200, loss[loss=0.07979, simple_loss=0.1096, pruned_loss=0.01706, audio_tagging_loss=0.007939, over 15812.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.09024, pruned_loss=0.01212, audio_tagging_loss=0.008691, over 3059448.05 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:00:34,492 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3675186.6666666665, ans=0.1 2023-11-27 02:00:45,950 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 02:00:49,091 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.594e+01 8.864e+01 9.560e+01 1.043e+02 1.445e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-27 02:00:49,188 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 551300 2023-11-27 02:00:59,475 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3675386.6666666665, ans=0.125 2023-11-27 02:01:01,677 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3675386.6666666665, ans=0.125 2023-11-27 02:01:19,007 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3675453.3333333335, ans=0.1 2023-11-27 02:01:21,709 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3675520.0, ans=0.125 2023-11-27 02:01:22,518 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 10250, loss[loss=0.09061, simple_loss=0.1285, pruned_loss=0.01842, audio_tagging_loss=0.007933, over 15529.00 frames. ], tot_loss[loss=0.0663, simple_loss=0.09081, pruned_loss=0.01225, audio_tagging_loss=0.008653, over 3061140.94 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:01:22,665 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3675520.0, ans=0.125 2023-11-27 02:01:36,852 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.46 vs. limit=15.0 2023-11-27 02:01:41,972 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3675586.6666666665, ans=0.04949747468305833 2023-11-27 02:01:44,915 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 551350 2023-11-27 02:01:49,631 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.67 vs. limit=10.0 2023-11-27 02:02:15,071 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3675786.6666666665, ans=0.125 2023-11-27 02:02:17,661 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3675853.3333333335, ans=0.0 2023-11-27 02:02:18,473 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 10300, loss[loss=0.06842, simple_loss=0.0882, pruned_loss=0.01666, audio_tagging_loss=0.00766, over 14156.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.09062, pruned_loss=0.01215, audio_tagging_loss=0.008678, over 3059594.42 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:02:31,498 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3675920.0, ans=0.125 2023-11-27 02:02:40,220 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.373e+01 8.971e+01 9.641e+01 1.026e+02 1.769e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-27 02:02:40,310 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 551400 2023-11-27 02:02:46,539 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3675986.6666666665, ans=0.0 2023-11-27 02:02:58,840 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3676053.3333333335, ans=0.125 2023-11-27 02:03:13,899 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 10350, loss[loss=0.05309, simple_loss=0.07, pruned_loss=0.008706, audio_tagging_loss=0.009384, over 15264.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.09065, pruned_loss=0.01214, audio_tagging_loss=0.008746, over 3060120.79 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:03:21,960 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3676186.6666666665, ans=0.125 2023-11-27 02:03:26,257 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3676253.3333333335, ans=0.125 2023-11-27 02:03:32,033 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3676253.3333333335, ans=0.125 2023-11-27 02:03:36,796 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 551450 2023-11-27 02:03:43,013 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.40 vs. limit=15.0 2023-11-27 02:03:48,957 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3676386.6666666665, ans=0.0 2023-11-27 02:04:00,653 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3676453.3333333335, ans=0.125 2023-11-27 02:04:08,560 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3676520.0, ans=0.125 2023-11-27 02:04:09,397 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 10400, loss[loss=0.06971, simple_loss=0.0925, pruned_loss=0.01109, audio_tagging_loss=0.01237, over 14537.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.09071, pruned_loss=0.01221, audio_tagging_loss=0.008793, over 3053590.08 frames. ], batch size: 53, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 02:04:31,273 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3676653.3333333335, ans=0.0 2023-11-27 02:04:32,026 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.446e+01 9.109e+01 9.691e+01 1.057e+02 2.130e+02, threshold=1.938e+02, percent-clipped=1.0 2023-11-27 02:04:32,122 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 551500 2023-11-27 02:04:41,739 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3676720.0, ans=0.125 2023-11-27 02:05:05,160 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 10450, loss[loss=0.07424, simple_loss=0.1037, pruned_loss=0.01357, audio_tagging_loss=0.00884, over 15767.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.09044, pruned_loss=0.01212, audio_tagging_loss=0.008862, over 3043834.25 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 02:05:08,876 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3676853.3333333335, ans=0.125 2023-11-27 02:05:26,715 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 551550 2023-11-27 02:05:26,812 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3676986.6666666665, ans=0.125 2023-11-27 02:05:40,666 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3677053.3333333335, ans=10.0 2023-11-27 02:06:00,586 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 10500, loss[loss=0.0528, simple_loss=0.06973, pruned_loss=0.008975, audio_tagging_loss=0.008958, over 16729.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08937, pruned_loss=0.01197, audio_tagging_loss=0.008846, over 3040659.97 frames. ], batch size: 63, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 02:06:01,769 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=3677186.6666666665, ans=0.05 2023-11-27 02:06:02,047 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.94 vs. limit=6.0 2023-11-27 02:06:14,173 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3677253.3333333335, ans=0.125 2023-11-27 02:06:22,966 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.664e+01 9.041e+01 9.594e+01 1.033e+02 2.053e+02, threshold=1.919e+02, percent-clipped=1.0 2023-11-27 02:06:23,072 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 551600 2023-11-27 02:06:51,982 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3677453.3333333335, ans=0.0 2023-11-27 02:06:56,016 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 10550, loss[loss=0.09862, simple_loss=0.1372, pruned_loss=0.02518, audio_tagging_loss=0.004847, over 15643.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.0898, pruned_loss=0.01207, audio_tagging_loss=0.008608, over 3047287.58 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 02:07:10,572 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3677586.6666666665, ans=10.0 2023-11-27 02:07:14,209 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3677586.6666666665, ans=0.035 2023-11-27 02:07:19,473 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 551650 2023-11-27 02:07:35,734 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3677720.0, ans=0.0 2023-11-27 02:07:52,393 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.40 vs. limit=22.5 2023-11-27 02:07:53,004 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 10600, loss[loss=0.05644, simple_loss=0.08123, pruned_loss=0.00846, audio_tagging_loss=0.007364, over 15995.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.08976, pruned_loss=0.012, audio_tagging_loss=0.008547, over 3047274.63 frames. ], batch size: 60, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 02:07:59,138 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3677853.3333333335, ans=0.0 2023-11-27 02:08:00,178 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3677853.3333333335, ans=0.0 2023-11-27 02:08:14,800 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.458e+01 8.957e+01 9.483e+01 1.042e+02 1.260e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-27 02:08:14,896 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 551700 2023-11-27 02:08:15,122 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3677986.6666666665, ans=0.125 2023-11-27 02:08:34,433 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3678053.3333333335, ans=0.125 2023-11-27 02:08:48,503 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 10650, loss[loss=0.07086, simple_loss=0.09713, pruned_loss=0.01227, audio_tagging_loss=0.01002, over 15951.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.09022, pruned_loss=0.01217, audio_tagging_loss=0.008481, over 3048192.78 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:09:04,870 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.38 vs. limit=15.0 2023-11-27 02:09:10,284 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 551750 2023-11-27 02:09:18,403 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3678320.0, ans=0.0 2023-11-27 02:09:23,162 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3678386.6666666665, ans=0.2 2023-11-27 02:09:42,998 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 10700, loss[loss=0.06538, simple_loss=0.09696, pruned_loss=0.01182, audio_tagging_loss=0.005078, over 15039.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.09024, pruned_loss=0.01217, audio_tagging_loss=0.008465, over 3048718.82 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:09:52,195 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.82 vs. limit=6.0 2023-11-27 02:10:06,393 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 551800 2023-11-27 02:10:07,317 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.732e+01 8.981e+01 9.456e+01 1.028e+02 1.264e+02, threshold=1.891e+02, percent-clipped=0.0 2023-11-27 02:10:21,701 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3678720.0, ans=0.125 2023-11-27 02:10:21,817 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.12 vs. limit=15.0 2023-11-27 02:10:25,887 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 02:10:40,268 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 10750, loss[loss=0.07039, simple_loss=0.09915, pruned_loss=0.01302, audio_tagging_loss=0.007794, over 15371.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.09012, pruned_loss=0.01208, audio_tagging_loss=0.008463, over 3052498.57 frames. ], batch size: 59, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:10:53,692 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3678920.0, ans=0.125 2023-11-27 02:11:01,941 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 551850 2023-11-27 02:11:05,185 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3678986.6666666665, ans=0.0 2023-11-27 02:11:11,630 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3679053.3333333335, ans=0.2 2023-11-27 02:11:15,412 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3679053.3333333335, ans=0.0 2023-11-27 02:11:18,409 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3679053.3333333335, ans=0.0 2023-11-27 02:11:24,751 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.16 vs. limit=15.0 2023-11-27 02:11:35,275 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 10800, loss[loss=0.06836, simple_loss=0.1018, pruned_loss=0.01103, audio_tagging_loss=0.006439, over 15608.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08984, pruned_loss=0.01196, audio_tagging_loss=0.008442, over 3048790.59 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:11:41,958 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3679186.6666666665, ans=0.125 2023-11-27 02:11:57,088 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 551900 2023-11-27 02:11:59,123 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.497e+01 8.776e+01 9.602e+01 1.034e+02 1.420e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-27 02:12:15,948 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3679386.6666666665, ans=0.0 2023-11-27 02:12:18,288 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.09 vs. limit=6.0 2023-11-27 02:12:30,750 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 10850, loss[loss=0.0474, simple_loss=0.06064, pruned_loss=0.007882, audio_tagging_loss=0.009202, over 14189.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.08936, pruned_loss=0.01188, audio_tagging_loss=0.008526, over 3049968.33 frames. ], batch size: 54, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:12:32,016 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3679520.0, ans=0.0 2023-11-27 02:12:54,121 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 551950 2023-11-27 02:12:56,507 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3679653.3333333335, ans=0.125 2023-11-27 02:12:59,588 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3679653.3333333335, ans=0.09899494936611666 2023-11-27 02:13:07,977 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3679720.0, ans=0.0 2023-11-27 02:13:09,340 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.46 vs. limit=22.5 2023-11-27 02:13:19,648 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3679786.6666666665, ans=0.0 2023-11-27 02:13:20,993 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 02:13:26,286 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 10900, loss[loss=0.04589, simple_loss=0.05808, pruned_loss=0.009358, audio_tagging_loss=0.007491, over 14367.00 frames. ], tot_loss[loss=0.06482, simple_loss=0.0891, pruned_loss=0.01179, audio_tagging_loss=0.008477, over 3043890.96 frames. ], batch size: 53, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:13:28,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3679853.3333333335, ans=0.07 2023-11-27 02:13:48,386 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3679986.6666666665, ans=0.125 2023-11-27 02:13:49,340 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 552000 2023-11-27 02:13:49,518 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3679986.6666666665, ans=0.1 2023-11-27 02:13:50,676 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-552000.pt 2023-11-27 02:13:53,600 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.440e+01 8.930e+01 9.586e+01 1.062e+02 1.591e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-27 02:14:02,174 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3680053.3333333335, ans=0.125 2023-11-27 02:14:02,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3680053.3333333335, ans=0.125 2023-11-27 02:14:10,360 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3680053.3333333335, ans=0.125 2023-11-27 02:14:13,012 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3680120.0, ans=0.1 2023-11-27 02:14:24,618 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3680186.6666666665, ans=0.125 2023-11-27 02:14:25,459 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 10950, loss[loss=0.06012, simple_loss=0.07906, pruned_loss=0.01129, audio_tagging_loss=0.009299, over 14722.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.08953, pruned_loss=0.01195, audio_tagging_loss=0.008458, over 3053918.91 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:14:29,902 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3680186.6666666665, ans=0.125 2023-11-27 02:14:35,119 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3680253.3333333335, ans=0.125 2023-11-27 02:14:46,788 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 552050 2023-11-27 02:14:52,745 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3680320.0, ans=0.125 2023-11-27 02:14:54,278 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3680320.0, ans=0.125 2023-11-27 02:15:16,694 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3680453.3333333335, ans=0.125 2023-11-27 02:15:18,953 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3680453.3333333335, ans=0.0 2023-11-27 02:15:20,774 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 11000, loss[loss=0.0654, simple_loss=0.08904, pruned_loss=0.01151, audio_tagging_loss=0.009375, over 14429.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08968, pruned_loss=0.01203, audio_tagging_loss=0.008488, over 3055415.49 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:15:21,361 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.47 vs. limit=10.0 2023-11-27 02:15:27,171 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 02:15:28,345 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3680520.0, ans=0.125 2023-11-27 02:15:41,774 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 02:15:43,605 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 552100 2023-11-27 02:15:46,144 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.719e+01 8.927e+01 9.605e+01 1.045e+02 1.330e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-27 02:15:56,218 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.15 vs. limit=15.0 2023-11-27 02:15:57,959 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3680720.0, ans=0.125 2023-11-27 02:16:05,466 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 02:16:09,643 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3680786.6666666665, ans=0.125 2023-11-27 02:16:11,843 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3680786.6666666665, ans=0.2 2023-11-27 02:16:16,362 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 11050, loss[loss=0.09005, simple_loss=0.1211, pruned_loss=0.02028, audio_tagging_loss=0.009218, over 15014.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.0897, pruned_loss=0.01212, audio_tagging_loss=0.008634, over 3051922.18 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:16:28,725 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3680920.0, ans=0.2 2023-11-27 02:16:32,889 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3680920.0, ans=0.125 2023-11-27 02:16:34,901 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3680920.0, ans=0.125 2023-11-27 02:16:34,936 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3680920.0, ans=0.125 2023-11-27 02:16:39,086 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 552150 2023-11-27 02:16:51,943 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3681053.3333333335, ans=0.125 2023-11-27 02:17:13,203 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 11100, loss[loss=0.07804, simple_loss=0.1073, pruned_loss=0.01867, audio_tagging_loss=0.005693, over 16367.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08942, pruned_loss=0.01209, audio_tagging_loss=0.008718, over 3055093.81 frames. ], batch size: 63, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:17:22,973 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3681253.3333333335, ans=0.0 2023-11-27 02:17:26,368 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.81 vs. limit=22.5 2023-11-27 02:17:34,496 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 552200 2023-11-27 02:17:36,804 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.717e+01 8.882e+01 9.437e+01 1.044e+02 2.360e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-27 02:17:41,836 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3681320.0, ans=0.125 2023-11-27 02:17:53,413 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3681386.6666666665, ans=0.0 2023-11-27 02:17:56,597 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3681453.3333333335, ans=0.125 2023-11-27 02:18:01,078 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.77 vs. limit=15.0 2023-11-27 02:18:08,110 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 11150, loss[loss=0.09418, simple_loss=0.1273, pruned_loss=0.02201, audio_tagging_loss=0.008529, over 16295.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.09011, pruned_loss=0.0124, audio_tagging_loss=0.008826, over 3053058.53 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:18:17,723 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3681586.6666666665, ans=0.125 2023-11-27 02:18:20,579 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3681586.6666666665, ans=0.0 2023-11-27 02:18:27,385 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3681586.6666666665, ans=0.125 2023-11-27 02:18:28,409 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3681586.6666666665, ans=0.0 2023-11-27 02:18:30,421 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 552250 2023-11-27 02:18:33,726 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3681653.3333333335, ans=0.1 2023-11-27 02:18:48,466 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3681720.0, ans=0.0 2023-11-27 02:19:03,612 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 11200, loss[loss=0.03775, simple_loss=0.04765, pruned_loss=0.006661, audio_tagging_loss=0.00726, over 13810.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.08857, pruned_loss=0.0122, audio_tagging_loss=0.008937, over 3047869.94 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 02:19:23,582 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3681920.0, ans=0.2 2023-11-27 02:19:25,634 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3681986.6666666665, ans=0.0 2023-11-27 02:19:26,597 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 552300 2023-11-27 02:19:28,646 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.316e+01 8.953e+01 9.456e+01 1.019e+02 1.233e+02, threshold=1.891e+02, percent-clipped=1.0 2023-11-27 02:19:32,340 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.86 vs. limit=6.0 2023-11-27 02:19:40,978 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.34 vs. limit=15.0 2023-11-27 02:19:48,303 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3682120.0, ans=0.0 2023-11-27 02:19:59,860 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 11250, loss[loss=0.06074, simple_loss=0.08533, pruned_loss=0.00959, audio_tagging_loss=0.008485, over 16579.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08886, pruned_loss=0.01219, audio_tagging_loss=0.008919, over 3044331.74 frames. ], batch size: 63, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:20:21,700 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 552350 2023-11-27 02:20:25,260 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.14 vs. limit=15.0 2023-11-27 02:20:28,593 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3682320.0, ans=0.0 2023-11-27 02:20:55,477 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 11300, loss[loss=0.05751, simple_loss=0.07899, pruned_loss=0.009971, audio_tagging_loss=0.008039, over 15109.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.08873, pruned_loss=0.01221, audio_tagging_loss=0.008825, over 3043822.20 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:20:57,908 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3682520.0, ans=0.0 2023-11-27 02:21:03,231 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3682520.0, ans=0.1 2023-11-27 02:21:04,168 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3682520.0, ans=0.1 2023-11-27 02:21:09,822 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3682586.6666666665, ans=0.015 2023-11-27 02:21:15,097 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3682586.6666666665, ans=0.125 2023-11-27 02:21:17,685 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 552400 2023-11-27 02:21:21,101 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.046e+01 9.061e+01 9.736e+01 1.047e+02 2.003e+02, threshold=1.947e+02, percent-clipped=1.0 2023-11-27 02:21:37,383 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.05 vs. limit=10.0 2023-11-27 02:21:50,796 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 11350, loss[loss=0.05794, simple_loss=0.07682, pruned_loss=0.01022, audio_tagging_loss=0.009315, over 13584.00 frames. ], tot_loss[loss=0.06504, simple_loss=0.08845, pruned_loss=0.01205, audio_tagging_loss=0.008765, over 3045586.85 frames. ], batch size: 54, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:22:10,371 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3682920.0, ans=0.125 2023-11-27 02:22:13,337 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 552450 2023-11-27 02:22:28,651 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3683053.3333333335, ans=0.125 2023-11-27 02:22:46,336 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 11400, loss[loss=0.06323, simple_loss=0.08472, pruned_loss=0.01343, audio_tagging_loss=0.007445, over 15071.00 frames. ], tot_loss[loss=0.0647, simple_loss=0.08809, pruned_loss=0.01196, audio_tagging_loss=0.008699, over 3037034.94 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:22:48,693 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3683186.6666666665, ans=0.2 2023-11-27 02:23:08,756 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 552500 2023-11-27 02:23:11,778 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.783e+01 9.008e+01 9.574e+01 1.020e+02 1.271e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-27 02:23:13,179 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3683320.0, ans=0.125 2023-11-27 02:23:30,937 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3683453.3333333335, ans=0.1 2023-11-27 02:23:34,786 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3683453.3333333335, ans=0.2 2023-11-27 02:23:41,789 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 11450, loss[loss=0.08276, simple_loss=0.1147, pruned_loss=0.01938, audio_tagging_loss=0.006016, over 15048.00 frames. ], tot_loss[loss=0.06455, simple_loss=0.08781, pruned_loss=0.012, audio_tagging_loss=0.008644, over 3039347.50 frames. ], batch size: 54, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:23:49,797 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3683520.0, ans=0.2 2023-11-27 02:24:04,007 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 552550 2023-11-27 02:24:05,277 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3683653.3333333335, ans=0.125 2023-11-27 02:24:16,213 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.74 vs. limit=15.0 2023-11-27 02:24:18,371 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.34 vs. limit=10.0 2023-11-27 02:24:29,024 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3683786.6666666665, ans=0.1 2023-11-27 02:24:37,128 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 11500, loss[loss=0.04406, simple_loss=0.04934, pruned_loss=0.006694, audio_tagging_loss=0.01269, over 16258.00 frames. ], tot_loss[loss=0.06488, simple_loss=0.08854, pruned_loss=0.01199, audio_tagging_loss=0.008624, over 3044431.38 frames. ], batch size: 64, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:24:47,230 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3683920.0, ans=0.125 2023-11-27 02:24:53,087 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3683920.0, ans=0.125 2023-11-27 02:24:59,762 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 552600 2023-11-27 02:25:03,200 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.546e+01 8.865e+01 9.337e+01 9.934e+01 1.227e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-27 02:25:23,488 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3684120.0, ans=0.2 2023-11-27 02:25:26,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3684120.0, ans=0.05 2023-11-27 02:25:30,808 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3684120.0, ans=0.125 2023-11-27 02:25:33,362 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 11550, loss[loss=0.08993, simple_loss=0.1388, pruned_loss=0.01598, audio_tagging_loss=0.004531, over 15829.00 frames. ], tot_loss[loss=0.06502, simple_loss=0.08869, pruned_loss=0.01204, audio_tagging_loss=0.008629, over 3043812.32 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:25:55,504 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 552650 2023-11-27 02:26:05,038 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 02:26:20,179 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.99 vs. limit=12.0 2023-11-27 02:26:28,797 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 11600, loss[loss=0.07888, simple_loss=0.1167, pruned_loss=0.01421, audio_tagging_loss=0.006307, over 15445.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08939, pruned_loss=0.01218, audio_tagging_loss=0.008594, over 3041973.24 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 02:26:39,070 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3684586.6666666665, ans=0.125 2023-11-27 02:26:50,925 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 552700 2023-11-27 02:26:52,134 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3684653.3333333335, ans=0.125 2023-11-27 02:26:53,397 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.74 vs. limit=10.0 2023-11-27 02:26:53,973 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.249e+01 8.971e+01 9.757e+01 1.054e+02 1.398e+02, threshold=1.951e+02, percent-clipped=0.0 2023-11-27 02:27:04,747 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3684720.0, ans=0.1 2023-11-27 02:27:05,887 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3684720.0, ans=0.0 2023-11-27 02:27:20,582 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3684786.6666666665, ans=0.1 2023-11-27 02:27:21,600 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3684786.6666666665, ans=0.09899494936611666 2023-11-27 02:27:24,035 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 11650, loss[loss=0.07983, simple_loss=0.1006, pruned_loss=0.01769, audio_tagging_loss=0.01182, over 14638.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.08956, pruned_loss=0.01212, audio_tagging_loss=0.008699, over 3042605.24 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 02:27:46,616 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 552750 2023-11-27 02:27:48,135 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.40 vs. limit=6.0 2023-11-27 02:27:57,105 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3685053.3333333335, ans=0.015 2023-11-27 02:27:59,385 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=3685053.3333333335, ans=0.1 2023-11-27 02:28:17,399 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3685120.0, ans=0.125 2023-11-27 02:28:19,274 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 11700, loss[loss=0.07375, simple_loss=0.09238, pruned_loss=0.01525, audio_tagging_loss=0.01231, over 16348.00 frames. ], tot_loss[loss=0.06523, simple_loss=0.08894, pruned_loss=0.01202, audio_tagging_loss=0.008746, over 3042665.05 frames. ], batch size: 64, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 02:28:22,174 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3685186.6666666665, ans=0.2 2023-11-27 02:28:33,293 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3685253.3333333335, ans=0.125 2023-11-27 02:28:41,480 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 552800 2023-11-27 02:28:44,819 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.511e+01 8.894e+01 9.455e+01 1.028e+02 1.281e+02, threshold=1.891e+02, percent-clipped=0.0 2023-11-27 02:28:46,125 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3685320.0, ans=0.0 2023-11-27 02:28:56,634 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3685386.6666666665, ans=0.0 2023-11-27 02:29:00,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3685386.6666666665, ans=0.125 2023-11-27 02:29:02,333 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.49 vs. limit=22.5 2023-11-27 02:29:04,643 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3685453.3333333335, ans=0.125 2023-11-27 02:29:06,225 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.55 vs. limit=12.0 2023-11-27 02:29:15,177 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.60 vs. limit=10.0 2023-11-27 02:29:15,646 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 11750, loss[loss=0.05912, simple_loss=0.07695, pruned_loss=0.01028, audio_tagging_loss=0.01037, over 14488.00 frames. ], tot_loss[loss=0.06496, simple_loss=0.08885, pruned_loss=0.01183, audio_tagging_loss=0.008703, over 3044378.01 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 02:29:27,576 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3685586.6666666665, ans=0.0 2023-11-27 02:29:35,436 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3685586.6666666665, ans=0.0 2023-11-27 02:29:38,006 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 552850 2023-11-27 02:29:42,362 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3685653.3333333335, ans=0.2 2023-11-27 02:29:46,646 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.67 vs. limit=15.0 2023-11-27 02:29:47,161 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3685653.3333333335, ans=0.125 2023-11-27 02:29:47,180 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3685653.3333333335, ans=0.2 2023-11-27 02:29:50,382 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3685720.0, ans=0.125 2023-11-27 02:29:53,867 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3685720.0, ans=0.0 2023-11-27 02:29:54,910 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3685720.0, ans=0.1 2023-11-27 02:29:58,055 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3685720.0, ans=0.2 2023-11-27 02:30:00,264 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3685786.6666666665, ans=0.0 2023-11-27 02:30:09,576 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3685853.3333333335, ans=0.125 2023-11-27 02:30:10,482 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 11800, loss[loss=0.08455, simple_loss=0.1184, pruned_loss=0.01598, audio_tagging_loss=0.009377, over 14829.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08938, pruned_loss=0.01193, audio_tagging_loss=0.008684, over 3043270.20 frames. ], batch size: 54, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 02:30:17,885 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.61 vs. limit=10.0 2023-11-27 02:30:26,533 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3685920.0, ans=0.0 2023-11-27 02:30:33,999 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 552900 2023-11-27 02:30:37,094 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.095e+01 8.976e+01 9.582e+01 1.019e+02 1.579e+02, threshold=1.916e+02, percent-clipped=0.0 2023-11-27 02:30:40,966 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.17 vs. limit=15.0 2023-11-27 02:30:41,560 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3685986.6666666665, ans=0.0 2023-11-27 02:30:48,027 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3686053.3333333335, ans=0.125 2023-11-27 02:30:55,414 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3686120.0, ans=0.1 2023-11-27 02:31:03,916 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.44 vs. limit=15.0 2023-11-27 02:31:06,368 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 11850, loss[loss=0.05529, simple_loss=0.07265, pruned_loss=0.008101, audio_tagging_loss=0.01086, over 14002.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.09028, pruned_loss=0.01213, audio_tagging_loss=0.008754, over 3050020.84 frames. ], batch size: 54, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 02:31:23,443 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3686253.3333333335, ans=0.0 2023-11-27 02:31:28,522 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 552950 2023-11-27 02:31:33,945 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3686320.0, ans=0.95 2023-11-27 02:31:40,343 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3686386.6666666665, ans=0.125 2023-11-27 02:31:45,295 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.84 vs. limit=15.0 2023-11-27 02:32:02,446 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 11900, loss[loss=0.05076, simple_loss=0.06553, pruned_loss=0.008352, audio_tagging_loss=0.009647, over 16095.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.08954, pruned_loss=0.01208, audio_tagging_loss=0.008875, over 3045477.23 frames. ], batch size: 61, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 02:32:12,258 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff2.min_abs, batch_count=3686586.6666666665, ans=0.1 2023-11-27 02:32:23,618 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 553000 2023-11-27 02:32:27,258 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.408e+01 8.694e+01 9.517e+01 1.011e+02 1.462e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-27 02:32:29,134 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3686653.3333333335, ans=0.1 2023-11-27 02:32:40,608 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3686720.0, ans=0.5 2023-11-27 02:32:54,408 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3686786.6666666665, ans=0.0 2023-11-27 02:32:57,282 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 11950, loss[loss=0.05856, simple_loss=0.07304, pruned_loss=0.01272, audio_tagging_loss=0.009315, over 15651.00 frames. ], tot_loss[loss=0.06494, simple_loss=0.08792, pruned_loss=0.01195, audio_tagging_loss=0.009031, over 3044213.10 frames. ], batch size: 61, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 02:33:08,623 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3686920.0, ans=0.035 2023-11-27 02:33:08,663 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3686920.0, ans=0.125 2023-11-27 02:33:20,243 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 553050 2023-11-27 02:33:34,454 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3687053.3333333335, ans=0.125 2023-11-27 02:33:38,314 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3687053.3333333335, ans=0.125 2023-11-27 02:33:39,825 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.62 vs. limit=10.0 2023-11-27 02:33:51,421 INFO [train_asr.py:1235] (0/4) Epoch 46, batch 12000, loss[loss=0.07608, simple_loss=0.1071, pruned_loss=0.01476, audio_tagging_loss=0.007781, over 15105.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.08916, pruned_loss=0.01219, audio_tagging_loss=0.009043, over 3051842.48 frames. ], batch size: 54, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 02:33:51,441 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-27 02:34:19,201 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.5184, 3.4570, 3.8078, 3.7009], device='cuda:0') 2023-11-27 02:34:23,574 INFO [train_asr.py:1267] (0/4) Epoch 46, validation: loss=0.05804, simple_loss=0.0505, pruned_loss=0.005297, audio_tagging_loss=0.02749, over 4681554.00 frames. 2023-11-27 02:34:23,574 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-27 02:34:33,864 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3687253.3333333335, ans=0.09899494936611666 2023-11-27 02:34:44,511 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 553100 2023-11-27 02:34:48,544 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/epoch-46.pt 2023-11-27 02:35:18,850 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.357e+01 8.955e+01 9.759e+01 1.053e+02 1.237e+02, threshold=1.952e+02, percent-clipped=0.0 2023-11-27 02:35:18,910 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 0, loss[loss=0.06621, simple_loss=0.08969, pruned_loss=0.00649, audio_tagging_loss=0.01487, over 15379.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.08969, pruned_loss=0.00649, audio_tagging_loss=0.01487, over 15379.00 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 02:35:18,916 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-27 02:35:50,398 INFO [train_asr.py:1267] (0/4) Epoch 47, validation: loss=0.05785, simple_loss=0.05054, pruned_loss=0.005317, audio_tagging_loss=0.02726, over 4681554.00 frames. 2023-11-27 02:35:50,399 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-27 02:35:57,974 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3687340.0, ans=0.2 2023-11-27 02:36:14,110 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3687473.3333333335, ans=0.0 2023-11-27 02:36:41,305 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3687606.6666666665, ans=0.125 2023-11-27 02:36:42,299 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 553150 2023-11-27 02:36:45,423 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 50, loss[loss=0.06605, simple_loss=0.08618, pruned_loss=0.008399, audio_tagging_loss=0.01455, over 14780.00 frames. ], tot_loss[loss=0.07142, simple_loss=0.08694, pruned_loss=0.01137, audio_tagging_loss=0.01657, over 689118.61 frames. ], batch size: 54, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 02:36:45,709 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3687673.3333333335, ans=0.0 2023-11-27 02:36:51,005 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3687673.3333333335, ans=0.2 2023-11-27 02:37:02,818 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 02:37:03,819 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3687740.0, ans=0.1 2023-11-27 02:37:04,371 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=12.87 vs. limit=15.0 2023-11-27 02:37:19,115 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3687873.3333333335, ans=0.125 2023-11-27 02:37:37,437 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 553200 2023-11-27 02:37:41,653 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.064e+01 9.815e+01 1.050e+02 1.145e+02 1.417e+02, threshold=2.101e+02, percent-clipped=0.0 2023-11-27 02:37:41,690 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 100, loss[loss=0.09102, simple_loss=0.1258, pruned_loss=0.0184, audio_tagging_loss=0.009694, over 15711.00 frames. ], tot_loss[loss=0.07094, simple_loss=0.08678, pruned_loss=0.01156, audio_tagging_loss=0.016, over 1208745.64 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 02:37:54,056 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3688073.3333333335, ans=0.125 2023-11-27 02:37:58,264 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3688073.3333333335, ans=0.125 2023-11-27 02:38:01,511 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3688073.3333333335, ans=0.125 2023-11-27 02:38:04,685 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3688140.0, ans=0.1 2023-11-27 02:38:07,374 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.36 vs. limit=15.0 2023-11-27 02:38:11,155 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3688140.0, ans=0.1 2023-11-27 02:38:16,714 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.61 vs. limit=10.0 2023-11-27 02:38:21,165 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3688206.6666666665, ans=0.125 2023-11-27 02:38:31,208 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3688273.3333333335, ans=0.125 2023-11-27 02:38:31,437 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.91 vs. limit=15.0 2023-11-27 02:38:34,162 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 553250 2023-11-27 02:38:37,321 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 150, loss[loss=0.06698, simple_loss=0.08705, pruned_loss=0.01154, audio_tagging_loss=0.01192, over 14427.00 frames. ], tot_loss[loss=0.06917, simple_loss=0.0867, pruned_loss=0.01139, audio_tagging_loss=0.01444, over 1615332.33 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 8.0 2023-11-27 02:38:44,843 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=3688340.0, ans=0.05 2023-11-27 02:38:56,630 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3688406.6666666665, ans=0.125 2023-11-27 02:39:29,317 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 553300 2023-11-27 02:39:29,798 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.14 vs. limit=15.0 2023-11-27 02:39:32,425 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 200, loss[loss=0.08289, simple_loss=0.1121, pruned_loss=0.01688, audio_tagging_loss=0.009961, over 15940.00 frames. ], tot_loss[loss=0.06793, simple_loss=0.08691, pruned_loss=0.01168, audio_tagging_loss=0.01279, over 1938973.26 frames. ], batch size: 54, lr: 1.44e-03, grad_scale: 8.0 2023-11-27 02:39:33,819 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-27 02:39:34,552 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.211e+01 9.198e+01 9.713e+01 1.048e+02 1.227e+02, threshold=1.943e+02, percent-clipped=0.0 2023-11-27 02:39:39,990 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3688673.3333333335, ans=0.125 2023-11-27 02:40:08,412 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3688873.3333333335, ans=0.0 2023-11-27 02:40:13,804 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3688873.3333333335, ans=0.125 2023-11-27 02:40:18,133 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3688940.0, ans=0.125 2023-11-27 02:40:20,204 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3688940.0, ans=0.125 2023-11-27 02:40:21,716 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.49 vs. limit=22.5 2023-11-27 02:40:24,216 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 553350 2023-11-27 02:40:27,872 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 250, loss[loss=0.06047, simple_loss=0.08702, pruned_loss=0.01062, audio_tagging_loss=0.006345, over 15182.00 frames. ], tot_loss[loss=0.06741, simple_loss=0.08791, pruned_loss=0.01197, audio_tagging_loss=0.01148, over 2184082.71 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 8.0 2023-11-27 02:40:43,433 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3689073.3333333335, ans=0.125 2023-11-27 02:40:47,945 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.56 vs. limit=15.0 2023-11-27 02:40:52,237 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.38 vs. limit=22.5 2023-11-27 02:40:52,945 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3689140.0, ans=0.025 2023-11-27 02:40:59,322 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3689140.0, ans=0.125 2023-11-27 02:41:07,875 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3689206.6666666665, ans=0.0 2023-11-27 02:41:15,138 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3689273.3333333335, ans=0.0 2023-11-27 02:41:21,363 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 553400 2023-11-27 02:41:24,694 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 300, loss[loss=0.06457, simple_loss=0.09046, pruned_loss=0.01142, audio_tagging_loss=0.00792, over 16318.00 frames. ], tot_loss[loss=0.06714, simple_loss=0.08878, pruned_loss=0.01206, audio_tagging_loss=0.0107, over 2378784.93 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 8.0 2023-11-27 02:41:26,093 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3689340.0, ans=0.2 2023-11-27 02:41:26,838 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.875e+01 9.232e+01 1.015e+02 1.128e+02 1.500e+02, threshold=2.030e+02, percent-clipped=0.0 2023-11-27 02:41:36,736 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3689406.6666666665, ans=0.0 2023-11-27 02:41:46,184 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3689473.3333333335, ans=0.125 2023-11-27 02:41:51,946 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3689473.3333333335, ans=0.125 2023-11-27 02:42:16,582 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 553450 2023-11-27 02:42:19,758 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 350, loss[loss=0.07581, simple_loss=0.1063, pruned_loss=0.01206, audio_tagging_loss=0.01063, over 15591.00 frames. ], tot_loss[loss=0.06753, simple_loss=0.09046, pruned_loss=0.01232, audio_tagging_loss=0.009977, over 2529135.30 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 8.0 2023-11-27 02:42:51,880 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3689806.6666666665, ans=0.2 2023-11-27 02:42:59,237 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3689873.3333333335, ans=0.0 2023-11-27 02:43:11,779 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 553500 2023-11-27 02:43:15,465 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 400, loss[loss=0.05767, simple_loss=0.07479, pruned_loss=0.009356, audio_tagging_loss=0.01091, over 14660.00 frames. ], tot_loss[loss=0.06707, simple_loss=0.09013, pruned_loss=0.01231, audio_tagging_loss=0.009687, over 2648295.22 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 02:43:18,099 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.006e+01 8.931e+01 9.402e+01 1.042e+02 1.214e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-27 02:43:26,914 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.83 vs. limit=22.5 2023-11-27 02:43:29,578 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3690073.3333333335, ans=0.125 2023-11-27 02:43:35,335 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3690073.3333333335, ans=0.125 2023-11-27 02:43:36,399 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3690073.3333333335, ans=0.0 2023-11-27 02:44:01,745 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3690273.3333333335, ans=0.125 2023-11-27 02:44:04,845 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3690273.3333333335, ans=0.0 2023-11-27 02:44:08,452 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 553550 2023-11-27 02:44:11,513 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 450, loss[loss=0.06112, simple_loss=0.0749, pruned_loss=0.01263, audio_tagging_loss=0.01105, over 13915.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.09031, pruned_loss=0.01227, audio_tagging_loss=0.009432, over 2737602.97 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 02:44:11,857 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3690340.0, ans=0.125 2023-11-27 02:44:17,591 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3690340.0, ans=0.2 2023-11-27 02:44:23,811 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3690406.6666666665, ans=0.1 2023-11-27 02:44:38,322 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.69 vs. limit=15.0 2023-11-27 02:45:01,915 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3690606.6666666665, ans=0.0 2023-11-27 02:45:03,905 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 553600 2023-11-27 02:45:07,297 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 500, loss[loss=0.06293, simple_loss=0.08366, pruned_loss=0.01185, audio_tagging_loss=0.009256, over 15450.00 frames. ], tot_loss[loss=0.06682, simple_loss=0.09074, pruned_loss=0.01228, audio_tagging_loss=0.009173, over 2810313.19 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 02:45:08,640 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3690673.3333333335, ans=0.125 2023-11-27 02:45:09,477 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.401e+01 8.932e+01 9.491e+01 1.008e+02 1.797e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-27 02:45:25,908 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.74 vs. limit=22.5 2023-11-27 02:45:33,000 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.42 vs. limit=22.5 2023-11-27 02:45:33,767 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3690806.6666666665, ans=0.2 2023-11-27 02:45:34,995 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.09 vs. limit=6.0 2023-11-27 02:45:47,099 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3690873.3333333335, ans=0.1 2023-11-27 02:45:47,116 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3690873.3333333335, ans=0.0 2023-11-27 02:45:57,660 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3690940.0, ans=0.125 2023-11-27 02:45:59,600 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 553650 2023-11-27 02:46:02,699 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 550, loss[loss=0.0839, simple_loss=0.1126, pruned_loss=0.019, audio_tagging_loss=0.008591, over 15324.00 frames. ], tot_loss[loss=0.06689, simple_loss=0.09124, pruned_loss=0.0123, audio_tagging_loss=0.008967, over 2864730.37 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 02:46:18,311 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3691073.3333333335, ans=0.0 2023-11-27 02:46:29,715 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.59 vs. limit=22.5 2023-11-27 02:46:37,915 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3691206.6666666665, ans=0.125 2023-11-27 02:46:40,023 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3691206.6666666665, ans=0.1 2023-11-27 02:46:55,933 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 553700 2023-11-27 02:46:59,206 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.30 vs. limit=22.5 2023-11-27 02:46:59,545 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 600, loss[loss=0.0753, simple_loss=0.1054, pruned_loss=0.01356, audio_tagging_loss=0.009033, over 15696.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.09047, pruned_loss=0.01218, audio_tagging_loss=0.00887, over 2907935.92 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 02:47:01,682 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.959e+01 8.835e+01 9.409e+01 1.013e+02 1.233e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-27 02:47:09,302 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3691406.6666666665, ans=0.125 2023-11-27 02:47:36,378 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 02:47:37,853 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.whiten.whitening_limit, batch_count=3691540.0, ans=15.0 2023-11-27 02:47:38,466 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3691540.0, ans=0.125 2023-11-27 02:47:42,164 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 02:47:49,087 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.65 vs. limit=5.0 2023-11-27 02:47:51,515 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 553750 2023-11-27 02:47:51,991 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.66 vs. limit=6.0 2023-11-27 02:47:55,225 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 650, loss[loss=0.04121, simple_loss=0.04967, pruned_loss=0.005808, audio_tagging_loss=0.01057, over 14965.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.09026, pruned_loss=0.01219, audio_tagging_loss=0.00873, over 2946351.02 frames. ], batch size: 60, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 02:48:01,828 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 02:48:01,968 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3691673.3333333335, ans=0.125 2023-11-27 02:48:07,208 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3691740.0, ans=0.125 2023-11-27 02:48:15,454 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3691740.0, ans=0.125 2023-11-27 02:48:37,014 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.52 vs. limit=22.5 2023-11-27 02:48:37,696 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3691873.3333333335, ans=0.1 2023-11-27 02:48:47,623 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 553800 2023-11-27 02:48:47,704 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3691940.0, ans=0.125 2023-11-27 02:48:51,058 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 700, loss[loss=0.06142, simple_loss=0.08391, pruned_loss=0.01239, audio_tagging_loss=0.007075, over 16194.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.09041, pruned_loss=0.01202, audio_tagging_loss=0.008705, over 2971880.80 frames. ], batch size: 62, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 02:48:52,250 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3692006.6666666665, ans=0.0 2023-11-27 02:48:53,138 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.109e+01 8.860e+01 9.509e+01 1.038e+02 1.459e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-27 02:48:59,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3692006.6666666665, ans=0.1 2023-11-27 02:49:06,643 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3692073.3333333335, ans=0.0 2023-11-27 02:49:09,828 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3692073.3333333335, ans=0.2 2023-11-27 02:49:20,368 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3692140.0, ans=0.2 2023-11-27 02:49:44,268 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 553850 2023-11-27 02:49:44,403 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3692273.3333333335, ans=0.125 2023-11-27 02:49:47,903 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 750, loss[loss=0.06695, simple_loss=0.0863, pruned_loss=0.01344, audio_tagging_loss=0.01035, over 14736.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.09063, pruned_loss=0.01219, audio_tagging_loss=0.008735, over 2995476.97 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 02:49:49,129 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3692340.0, ans=0.2 2023-11-27 02:49:50,249 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3692340.0, ans=0.125 2023-11-27 02:50:11,989 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3692473.3333333335, ans=0.125 2023-11-27 02:50:40,018 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 553900 2023-11-27 02:50:43,109 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 800, loss[loss=0.07847, simple_loss=0.1129, pruned_loss=0.01589, audio_tagging_loss=0.006128, over 15884.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.09089, pruned_loss=0.01213, audio_tagging_loss=0.008782, over 3018506.96 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 02:50:45,225 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.898e+01 9.051e+01 9.571e+01 1.030e+02 1.342e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-27 02:51:03,362 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3692740.0, ans=0.1 2023-11-27 02:51:22,758 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 02:51:25,967 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3692873.3333333335, ans=0.0 2023-11-27 02:51:35,957 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 553950 2023-11-27 02:51:39,021 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 850, loss[loss=0.07751, simple_loss=0.1116, pruned_loss=0.01389, audio_tagging_loss=0.007811, over 15318.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.09045, pruned_loss=0.01214, audio_tagging_loss=0.008925, over 3023480.65 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 02:51:40,342 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3693006.6666666665, ans=0.2 2023-11-27 02:51:54,173 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3693073.3333333335, ans=0.125 2023-11-27 02:52:30,841 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3693273.3333333335, ans=0.125 2023-11-27 02:52:32,224 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 554000 2023-11-27 02:52:35,565 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 900, loss[loss=0.07521, simple_loss=0.1065, pruned_loss=0.01457, audio_tagging_loss=0.00736, over 15102.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.08989, pruned_loss=0.01208, audio_tagging_loss=0.008909, over 3029504.55 frames. ], batch size: 53, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 02:52:39,250 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.556e+01 8.856e+01 9.562e+01 1.034e+02 1.273e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-27 02:52:43,789 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3693340.0, ans=0.125 2023-11-27 02:53:06,470 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3693473.3333333335, ans=0.2 2023-11-27 02:53:12,284 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3693540.0, ans=0.09899494936611666 2023-11-27 02:53:13,344 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3693540.0, ans=0.09899494936611666 2023-11-27 02:53:28,056 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 554050 2023-11-27 02:53:31,207 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 950, loss[loss=0.07832, simple_loss=0.1196, pruned_loss=0.01336, audio_tagging_loss=0.005152, over 15710.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08983, pruned_loss=0.01194, audio_tagging_loss=0.008788, over 3036113.28 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 02:53:33,493 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3693673.3333333335, ans=0.2 2023-11-27 02:53:59,397 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3693806.6666666665, ans=0.04949747468305833 2023-11-27 02:54:04,776 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.26 vs. limit=10.0 2023-11-27 02:54:23,168 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 554100 2023-11-27 02:54:24,459 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3693940.0, ans=0.125 2023-11-27 02:54:25,419 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3694006.6666666665, ans=0.0 2023-11-27 02:54:26,296 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 1000, loss[loss=0.06257, simple_loss=0.08855, pruned_loss=0.009254, audio_tagging_loss=0.009041, over 15610.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.08974, pruned_loss=0.01189, audio_tagging_loss=0.008666, over 3042001.44 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 02:54:29,441 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.094e+01 9.006e+01 9.495e+01 1.025e+02 1.376e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-27 02:54:33,421 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3694006.6666666665, ans=0.0 2023-11-27 02:54:34,381 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3694006.6666666665, ans=0.0 2023-11-27 02:54:34,387 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3694006.6666666665, ans=0.1 2023-11-27 02:54:35,994 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3694006.6666666665, ans=0.125 2023-11-27 02:54:38,601 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=10.74 vs. limit=15.0 2023-11-27 02:54:39,706 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3694073.3333333335, ans=0.1 2023-11-27 02:54:48,703 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3694140.0, ans=0.125 2023-11-27 02:54:49,613 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 02:55:01,394 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff3.min_abs, batch_count=3694206.6666666665, ans=0.2 2023-11-27 02:55:15,113 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.03 vs. limit=12.0 2023-11-27 02:55:18,976 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.84 vs. limit=15.0 2023-11-27 02:55:19,481 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 554150 2023-11-27 02:55:22,275 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3694340.0, ans=0.0 2023-11-27 02:55:23,090 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 1050, loss[loss=0.09053, simple_loss=0.1331, pruned_loss=0.01782, audio_tagging_loss=0.006155, over 15837.00 frames. ], tot_loss[loss=0.0651, simple_loss=0.08937, pruned_loss=0.01183, audio_tagging_loss=0.008585, over 3041134.02 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 02:55:25,769 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.61 vs. limit=6.0 2023-11-27 02:55:48,963 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3694473.3333333335, ans=0.125 2023-11-27 02:56:00,979 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.06 vs. limit=15.0 2023-11-27 02:56:15,436 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 554200 2023-11-27 02:56:18,832 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 1100, loss[loss=0.06622, simple_loss=0.09077, pruned_loss=0.01283, audio_tagging_loss=0.008006, over 15140.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.08981, pruned_loss=0.01197, audio_tagging_loss=0.008568, over 3040921.77 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 02:56:19,031 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3694673.3333333335, ans=0.125 2023-11-27 02:56:19,894 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 02:56:21,957 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.547e+01 8.965e+01 9.717e+01 1.039e+02 1.284e+02, threshold=1.943e+02, percent-clipped=0.0 2023-11-27 02:56:26,941 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.25 vs. limit=22.5 2023-11-27 02:56:34,757 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3694740.0, ans=0.0 2023-11-27 02:56:35,077 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.84 vs. limit=22.5 2023-11-27 02:56:41,884 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3694806.6666666665, ans=0.1 2023-11-27 02:56:50,758 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3694873.3333333335, ans=0.125 2023-11-27 02:57:10,360 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 554250 2023-11-27 02:57:13,425 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 1150, loss[loss=0.08453, simple_loss=0.1249, pruned_loss=0.01695, audio_tagging_loss=0.005125, over 16201.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08987, pruned_loss=0.01188, audio_tagging_loss=0.008492, over 3043492.19 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 02:57:31,966 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3695073.3333333335, ans=0.125 2023-11-27 02:57:36,284 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.57 vs. limit=22.5 2023-11-27 02:57:37,938 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3695140.0, ans=0.125 2023-11-27 02:57:42,521 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.15 vs. limit=15.0 2023-11-27 02:57:44,369 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3695140.0, ans=0.125 2023-11-27 02:57:45,418 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3695140.0, ans=0.125 2023-11-27 02:57:57,284 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.55 vs. limit=15.0 2023-11-27 02:58:01,344 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3695273.3333333335, ans=0.0 2023-11-27 02:58:02,327 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3695273.3333333335, ans=0.125 2023-11-27 02:58:03,477 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3695273.3333333335, ans=0.1 2023-11-27 02:58:04,444 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3695273.3333333335, ans=0.125 2023-11-27 02:58:04,796 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.87 vs. limit=15.0 2023-11-27 02:58:05,418 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 554300 2023-11-27 02:58:09,072 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 1200, loss[loss=0.06412, simple_loss=0.08781, pruned_loss=0.01153, audio_tagging_loss=0.008688, over 15471.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.09038, pruned_loss=0.01216, audio_tagging_loss=0.008493, over 3042909.62 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 02:58:12,706 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.838e+01 8.950e+01 9.657e+01 1.053e+02 1.302e+02, threshold=1.931e+02, percent-clipped=0.0 2023-11-27 02:58:30,914 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.22 vs. limit=22.5 2023-11-27 02:58:36,735 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3695473.3333333335, ans=0.125 2023-11-27 02:58:54,682 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.84 vs. limit=6.0 2023-11-27 02:58:57,432 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3695606.6666666665, ans=0.0 2023-11-27 02:59:02,062 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 554350 2023-11-27 02:59:05,175 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 1250, loss[loss=0.05764, simple_loss=0.08231, pruned_loss=0.009901, audio_tagging_loss=0.006582, over 16143.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.08923, pruned_loss=0.01203, audio_tagging_loss=0.00851, over 3045599.81 frames. ], batch size: 62, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 02:59:06,531 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3695673.3333333335, ans=0.125 2023-11-27 02:59:14,866 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3695740.0, ans=0.0 2023-11-27 02:59:22,388 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3695740.0, ans=0.0 2023-11-27 02:59:27,731 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3695806.6666666665, ans=0.125 2023-11-27 02:59:50,338 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3695940.0, ans=0.1 2023-11-27 02:59:51,330 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3695940.0, ans=0.1 2023-11-27 02:59:57,569 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 554400 2023-11-27 03:00:01,027 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 1300, loss[loss=0.0702, simple_loss=0.09297, pruned_loss=0.01482, audio_tagging_loss=0.008893, over 16408.00 frames. ], tot_loss[loss=0.0652, simple_loss=0.08942, pruned_loss=0.01198, audio_tagging_loss=0.008506, over 3045199.98 frames. ], batch size: 62, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 03:00:04,121 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.705e+01 8.987e+01 9.539e+01 1.033e+02 1.348e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-27 03:00:33,083 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3696140.0, ans=0.0 2023-11-27 03:00:41,714 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3696206.6666666665, ans=0.125 2023-11-27 03:00:53,265 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 554450 2023-11-27 03:00:55,412 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3696340.0, ans=0.125 2023-11-27 03:00:56,944 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 1350, loss[loss=0.04856, simple_loss=0.06056, pruned_loss=0.008275, audio_tagging_loss=0.01001, over 14777.00 frames. ], tot_loss[loss=0.06471, simple_loss=0.08886, pruned_loss=0.01177, audio_tagging_loss=0.008514, over 3042340.45 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 03:01:05,831 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3696340.0, ans=0.125 2023-11-27 03:01:16,884 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3696406.6666666665, ans=0.2 2023-11-27 03:01:27,798 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.38 vs. limit=15.0 2023-11-27 03:01:35,762 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 03:01:35,999 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3696540.0, ans=0.2 2023-11-27 03:01:36,055 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3696540.0, ans=0.1 2023-11-27 03:01:50,033 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 554500 2023-11-27 03:01:53,154 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 1400, loss[loss=0.07036, simple_loss=0.1016, pruned_loss=0.0133, audio_tagging_loss=0.006252, over 14933.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.08917, pruned_loss=0.01185, audio_tagging_loss=0.008697, over 3046051.93 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:01:57,318 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.439e+01 8.796e+01 9.481e+01 1.017e+02 1.266e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-27 03:01:57,758 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.05 vs. limit=10.0 2023-11-27 03:02:00,793 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3696673.3333333335, ans=0.07 2023-11-27 03:02:00,825 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3696673.3333333335, ans=0.125 2023-11-27 03:02:31,726 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3696873.3333333335, ans=0.125 2023-11-27 03:02:44,934 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 554550 2023-11-27 03:02:48,012 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 1450, loss[loss=0.06103, simple_loss=0.07499, pruned_loss=0.01339, audio_tagging_loss=0.01014, over 14724.00 frames. ], tot_loss[loss=0.06464, simple_loss=0.08831, pruned_loss=0.01174, audio_tagging_loss=0.008737, over 3047778.97 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:03:40,314 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 554600 2023-11-27 03:03:43,658 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 1500, loss[loss=0.05683, simple_loss=0.07983, pruned_loss=0.006264, audio_tagging_loss=0.01065, over 15584.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08957, pruned_loss=0.01203, audio_tagging_loss=0.008769, over 3047956.38 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:03:48,364 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.418e+01 9.023e+01 9.880e+01 1.062e+02 1.307e+02, threshold=1.976e+02, percent-clipped=0.0 2023-11-27 03:03:52,326 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=3697340.0, ans=0.02 2023-11-27 03:03:53,327 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3697340.0, ans=0.0 2023-11-27 03:04:01,852 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3697406.6666666665, ans=0.0 2023-11-27 03:04:11,468 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 03:04:35,782 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3697606.6666666665, ans=0.0 2023-11-27 03:04:36,695 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 554650 2023-11-27 03:04:40,349 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 1550, loss[loss=0.06019, simple_loss=0.08273, pruned_loss=0.01177, audio_tagging_loss=0.007059, over 14598.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.08995, pruned_loss=0.01208, audio_tagging_loss=0.008792, over 3044347.15 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:04:48,477 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3697673.3333333335, ans=0.1 2023-11-27 03:04:49,936 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.39 vs. limit=15.0 2023-11-27 03:04:51,737 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3697740.0, ans=0.0 2023-11-27 03:05:18,281 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3697873.3333333335, ans=0.125 2023-11-27 03:05:32,965 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 554700 2023-11-27 03:05:36,067 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 1600, loss[loss=0.05134, simple_loss=0.06913, pruned_loss=0.008859, audio_tagging_loss=0.007914, over 15055.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.08937, pruned_loss=0.01198, audio_tagging_loss=0.008935, over 3046170.38 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:05:41,276 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.275e+01 8.979e+01 9.588e+01 1.025e+02 1.510e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-27 03:05:42,565 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3698006.6666666665, ans=0.125 2023-11-27 03:05:49,998 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3698073.3333333335, ans=0.1 2023-11-27 03:05:50,918 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3698073.3333333335, ans=0.125 2023-11-27 03:05:54,757 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3698073.3333333335, ans=0.125 2023-11-27 03:06:21,710 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3698273.3333333335, ans=0.0 2023-11-27 03:06:24,199 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.24 vs. limit=15.0 2023-11-27 03:06:25,973 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3698273.3333333335, ans=0.1 2023-11-27 03:06:27,980 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 554750 2023-11-27 03:06:31,107 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 1650, loss[loss=0.06884, simple_loss=0.09087, pruned_loss=0.01436, audio_tagging_loss=0.009044, over 14949.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08907, pruned_loss=0.01183, audio_tagging_loss=0.008811, over 3045475.08 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:06:41,790 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3698406.6666666665, ans=0.1 2023-11-27 03:06:51,973 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3698406.6666666665, ans=0.125 2023-11-27 03:06:52,907 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3698473.3333333335, ans=0.125 2023-11-27 03:06:56,818 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3698473.3333333335, ans=0.2 2023-11-27 03:06:59,174 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.23 vs. limit=10.0 2023-11-27 03:07:08,657 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.65 vs. limit=6.0 2023-11-27 03:07:23,990 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 554800 2023-11-27 03:07:27,473 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 1700, loss[loss=0.0694, simple_loss=0.09984, pruned_loss=0.01058, audio_tagging_loss=0.008903, over 16458.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.08919, pruned_loss=0.01195, audio_tagging_loss=0.008866, over 3046103.54 frames. ], batch size: 62, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:07:30,855 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.95 vs. limit=22.5 2023-11-27 03:07:33,350 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.387e+01 8.918e+01 9.493e+01 1.014e+02 1.179e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-27 03:07:35,718 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3698673.3333333335, ans=0.0 2023-11-27 03:07:57,820 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.28 vs. limit=6.0 2023-11-27 03:08:02,629 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.32 vs. limit=15.0 2023-11-27 03:08:20,267 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 554850 2023-11-27 03:08:24,020 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 1750, loss[loss=0.05827, simple_loss=0.07673, pruned_loss=0.01057, audio_tagging_loss=0.009329, over 15952.00 frames. ], tot_loss[loss=0.0645, simple_loss=0.08788, pruned_loss=0.01172, audio_tagging_loss=0.00885, over 3055020.94 frames. ], batch size: 61, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:08:32,748 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3699006.6666666665, ans=0.1 2023-11-27 03:08:48,932 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3699140.0, ans=0.125 2023-11-27 03:09:16,561 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 554900 2023-11-27 03:09:19,729 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 1800, loss[loss=0.09169, simple_loss=0.131, pruned_loss=0.0193, audio_tagging_loss=0.006865, over 15360.00 frames. ], tot_loss[loss=0.06418, simple_loss=0.08743, pruned_loss=0.01166, audio_tagging_loss=0.008805, over 3052217.00 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 8.0 2023-11-27 03:09:26,754 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.056e+01 8.924e+01 9.583e+01 9.926e+01 1.257e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-27 03:09:35,238 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.60 vs. limit=15.0 2023-11-27 03:09:35,829 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3699406.6666666665, ans=0.125 2023-11-27 03:09:44,041 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3699473.3333333335, ans=0.0 2023-11-27 03:09:49,477 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.37 vs. limit=10.0 2023-11-27 03:10:12,809 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 554950 2023-11-27 03:10:16,020 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 1850, loss[loss=0.0635, simple_loss=0.08578, pruned_loss=0.01351, audio_tagging_loss=0.007097, over 15424.00 frames. ], tot_loss[loss=0.06419, simple_loss=0.0876, pruned_loss=0.01172, audio_tagging_loss=0.008669, over 3050840.15 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 8.0 2023-11-27 03:10:31,551 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3699740.0, ans=0.05 2023-11-27 03:10:39,604 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3699806.6666666665, ans=0.125 2023-11-27 03:10:46,253 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.88 vs. limit=6.0 2023-11-27 03:10:52,889 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3699873.3333333335, ans=0.07 2023-11-27 03:10:59,852 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3699940.0, ans=0.125 2023-11-27 03:11:04,558 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3699940.0, ans=0.125 2023-11-27 03:11:06,672 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3699940.0, ans=0.1 2023-11-27 03:11:08,673 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 555000 2023-11-27 03:11:10,219 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3699940.0, ans=0.0 2023-11-27 03:11:12,133 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 1900, loss[loss=0.07353, simple_loss=0.1016, pruned_loss=0.01457, audio_tagging_loss=0.008175, over 15531.00 frames. ], tot_loss[loss=0.06387, simple_loss=0.08723, pruned_loss=0.01165, audio_tagging_loss=0.008605, over 3051824.92 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 8.0 2023-11-27 03:11:18,506 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.220e+01 9.107e+01 9.832e+01 1.049e+02 1.489e+02, threshold=1.966e+02, percent-clipped=0.0 2023-11-27 03:11:21,670 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.65 vs. limit=22.5 2023-11-27 03:12:05,085 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 555050 2023-11-27 03:12:05,570 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.13 vs. limit=15.0 2023-11-27 03:12:08,238 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 1950, loss[loss=0.05264, simple_loss=0.06511, pruned_loss=0.008489, audio_tagging_loss=0.0116, over 15989.00 frames. ], tot_loss[loss=0.06337, simple_loss=0.08662, pruned_loss=0.01145, audio_tagging_loss=0.008608, over 3054743.64 frames. ], batch size: 62, lr: 1.44e-03, grad_scale: 8.0 2023-11-27 03:12:18,641 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=3700406.6666666665, ans=15.0 2023-11-27 03:12:26,361 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3700406.6666666665, ans=0.125 2023-11-27 03:12:30,035 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3700473.3333333335, ans=0.1 2023-11-27 03:12:39,569 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3700473.3333333335, ans=0.125 2023-11-27 03:12:40,965 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.58 vs. limit=15.0 2023-11-27 03:12:57,145 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3700606.6666666665, ans=0.125 2023-11-27 03:13:00,670 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 555100 2023-11-27 03:13:03,523 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3700673.3333333335, ans=0.125 2023-11-27 03:13:04,290 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 2000, loss[loss=0.06779, simple_loss=0.08703, pruned_loss=0.0156, audio_tagging_loss=0.008675, over 16049.00 frames. ], tot_loss[loss=0.06387, simple_loss=0.08717, pruned_loss=0.01169, audio_tagging_loss=0.008594, over 3046380.91 frames. ], batch size: 63, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:13:05,612 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3700673.3333333335, ans=0.07 2023-11-27 03:13:11,254 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.430e+01 8.780e+01 9.356e+01 1.007e+02 1.266e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-27 03:13:11,944 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.46 vs. limit=15.0 2023-11-27 03:13:25,290 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3700806.6666666665, ans=0.125 2023-11-27 03:13:38,636 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3700873.3333333335, ans=0.1 2023-11-27 03:13:57,164 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 555150 2023-11-27 03:14:00,289 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 2050, loss[loss=0.06439, simple_loss=0.08483, pruned_loss=0.01233, audio_tagging_loss=0.009642, over 13219.00 frames. ], tot_loss[loss=0.06443, simple_loss=0.08789, pruned_loss=0.01191, audio_tagging_loss=0.00857, over 3042640.23 frames. ], batch size: 52, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:14:42,100 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.04 vs. limit=10.0 2023-11-27 03:14:50,712 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.85 vs. limit=15.0 2023-11-27 03:14:52,361 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 555200 2023-11-27 03:14:55,682 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 2100, loss[loss=0.07551, simple_loss=0.09563, pruned_loss=0.01832, audio_tagging_loss=0.009371, over 15028.00 frames. ], tot_loss[loss=0.06466, simple_loss=0.08826, pruned_loss=0.01195, audio_tagging_loss=0.008577, over 3046780.80 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:14:56,450 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.80 vs. limit=22.5 2023-11-27 03:15:02,573 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.623e+01 8.818e+01 9.814e+01 1.041e+02 1.368e+02, threshold=1.963e+02, percent-clipped=0.0 2023-11-27 03:15:37,458 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3701540.0, ans=0.125 2023-11-27 03:15:37,880 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.27 vs. limit=15.0 2023-11-27 03:15:44,826 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.12 vs. limit=12.0 2023-11-27 03:15:46,457 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3701606.6666666665, ans=0.125 2023-11-27 03:15:46,701 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.02 vs. limit=15.0 2023-11-27 03:15:49,104 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 555250 2023-11-27 03:15:52,281 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 2150, loss[loss=0.08662, simple_loss=0.1196, pruned_loss=0.01779, audio_tagging_loss=0.009008, over 15716.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.08876, pruned_loss=0.012, audio_tagging_loss=0.008593, over 3053760.96 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:15:59,461 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3701673.3333333335, ans=0.1 2023-11-27 03:16:24,273 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 03:16:31,901 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff2.min_abs, batch_count=3701873.3333333335, ans=0.1 2023-11-27 03:16:40,451 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.75 vs. limit=8.0 2023-11-27 03:16:45,542 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 555300 2023-11-27 03:16:48,648 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 2200, loss[loss=0.06059, simple_loss=0.07796, pruned_loss=0.0111, audio_tagging_loss=0.01051, over 15818.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.08993, pruned_loss=0.01212, audio_tagging_loss=0.008511, over 3059038.75 frames. ], batch size: 60, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:16:55,030 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.053e+01 9.078e+01 9.706e+01 1.033e+02 2.180e+02, threshold=1.941e+02, percent-clipped=1.0 2023-11-27 03:16:59,564 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3702073.3333333335, ans=0.125 2023-11-27 03:17:01,099 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.60 vs. limit=22.5 2023-11-27 03:17:10,102 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3702140.0, ans=0.0 2023-11-27 03:17:12,773 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3702140.0, ans=0.125 2023-11-27 03:17:14,963 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.89 vs. limit=15.0 2023-11-27 03:17:25,400 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3702206.6666666665, ans=0.0 2023-11-27 03:17:25,437 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3702206.6666666665, ans=0.2 2023-11-27 03:17:29,184 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3702206.6666666665, ans=0.125 2023-11-27 03:17:32,196 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3702273.3333333335, ans=0.1 2023-11-27 03:17:40,540 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 555350 2023-11-27 03:17:42,046 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.95 vs. limit=22.5 2023-11-27 03:17:43,601 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 2250, loss[loss=0.07635, simple_loss=0.1013, pruned_loss=0.01517, audio_tagging_loss=0.01054, over 14412.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.09004, pruned_loss=0.01232, audio_tagging_loss=0.008584, over 3050473.63 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:18:05,898 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.71 vs. limit=15.0 2023-11-27 03:18:11,496 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3702473.3333333335, ans=0.1 2023-11-27 03:18:35,764 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 555400 2023-11-27 03:18:39,678 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 2300, loss[loss=0.08243, simple_loss=0.1138, pruned_loss=0.01476, audio_tagging_loss=0.01075, over 15178.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.09023, pruned_loss=0.01235, audio_tagging_loss=0.008639, over 3050602.64 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:18:39,928 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 03:18:46,310 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.83 vs. limit=12.0 2023-11-27 03:18:46,633 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.925e+01 9.197e+01 9.925e+01 1.066e+02 1.274e+02, threshold=1.985e+02, percent-clipped=0.0 2023-11-27 03:18:53,807 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3702740.0, ans=0.1 2023-11-27 03:19:04,561 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3702806.6666666665, ans=0.04949747468305833 2023-11-27 03:19:07,747 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3702806.6666666665, ans=10.0 2023-11-27 03:19:10,797 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3702806.6666666665, ans=0.09899494936611666 2023-11-27 03:19:11,906 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3702873.3333333335, ans=0.2 2023-11-27 03:19:14,419 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.42 vs. limit=15.0 2023-11-27 03:19:27,973 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 03:19:31,261 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3702940.0, ans=0.125 2023-11-27 03:19:31,429 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3702940.0, ans=0.125 2023-11-27 03:19:32,261 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 555450 2023-11-27 03:19:35,982 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 2350, loss[loss=0.06348, simple_loss=0.08521, pruned_loss=0.01119, audio_tagging_loss=0.009691, over 15207.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.08966, pruned_loss=0.01222, audio_tagging_loss=0.008684, over 3051602.37 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:19:41,903 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten.whitening_limit, batch_count=3703006.6666666665, ans=15.0 2023-11-27 03:19:46,191 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.61 vs. limit=15.0 2023-11-27 03:19:51,233 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3703073.3333333335, ans=0.0 2023-11-27 03:19:58,886 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.39 vs. limit=6.0 2023-11-27 03:20:27,936 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 555500 2023-11-27 03:20:29,126 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3703273.3333333335, ans=0.1 2023-11-27 03:20:31,098 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 2400, loss[loss=0.06818, simple_loss=0.09507, pruned_loss=0.01173, audio_tagging_loss=0.008918, over 15722.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.09006, pruned_loss=0.01213, audio_tagging_loss=0.008768, over 3041721.33 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 03:20:37,430 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.394e+01 8.851e+01 9.612e+01 1.018e+02 1.276e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-27 03:20:50,790 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.99 vs. limit=10.0 2023-11-27 03:21:02,033 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3703473.3333333335, ans=0.2 2023-11-27 03:21:18,126 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3703606.6666666665, ans=0.0 2023-11-27 03:21:23,243 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 555550 2023-11-27 03:21:26,361 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 2450, loss[loss=0.07753, simple_loss=0.1172, pruned_loss=0.0138, audio_tagging_loss=0.00513, over 16411.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.08976, pruned_loss=0.01204, audio_tagging_loss=0.008871, over 3040427.38 frames. ], batch size: 60, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 03:21:38,695 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3703740.0, ans=0.2 2023-11-27 03:21:38,741 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3703740.0, ans=0.0 2023-11-27 03:21:55,457 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=3703806.6666666665, ans=6.0 2023-11-27 03:22:09,917 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3703940.0, ans=0.125 2023-11-27 03:22:19,685 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 555600 2023-11-27 03:22:23,111 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 2500, loss[loss=0.08975, simple_loss=0.1258, pruned_loss=0.02129, audio_tagging_loss=0.005577, over 16460.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.09006, pruned_loss=0.0121, audio_tagging_loss=0.008892, over 3040396.23 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 03:22:30,032 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.789e+01 9.086e+01 9.685e+01 1.022e+02 1.331e+02, threshold=1.937e+02, percent-clipped=0.0 2023-11-27 03:22:35,601 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3704073.3333333335, ans=0.125 2023-11-27 03:22:37,832 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3704073.3333333335, ans=0.125 2023-11-27 03:22:45,376 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.17 vs. limit=22.5 2023-11-27 03:22:48,431 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3704140.0, ans=0.1 2023-11-27 03:22:50,544 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3704140.0, ans=0.0 2023-11-27 03:23:11,665 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3704273.3333333335, ans=0.0 2023-11-27 03:23:13,789 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3704273.3333333335, ans=0.1 2023-11-27 03:23:15,707 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 555650 2023-11-27 03:23:18,858 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 2550, loss[loss=0.08438, simple_loss=0.1133, pruned_loss=0.02148, audio_tagging_loss=0.006251, over 15345.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08949, pruned_loss=0.01206, audio_tagging_loss=0.008866, over 3045481.70 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 03:23:21,068 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3704340.0, ans=0.125 2023-11-27 03:23:37,970 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.56 vs. limit=15.0 2023-11-27 03:23:38,873 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.82 vs. limit=12.0 2023-11-27 03:23:39,672 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3704473.3333333335, ans=0.125 2023-11-27 03:23:42,361 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3704473.3333333335, ans=0.0 2023-11-27 03:24:01,827 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3704540.0, ans=0.125 2023-11-27 03:24:11,021 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 555700 2023-11-27 03:24:13,854 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.43 vs. limit=22.5 2023-11-27 03:24:14,170 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 2600, loss[loss=0.05396, simple_loss=0.0729, pruned_loss=0.008972, audio_tagging_loss=0.008537, over 15693.00 frames. ], tot_loss[loss=0.06465, simple_loss=0.0884, pruned_loss=0.01181, audio_tagging_loss=0.00864, over 3045069.79 frames. ], batch size: 60, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:24:22,130 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.465e+01 9.053e+01 9.535e+01 1.024e+02 1.234e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-27 03:24:47,344 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3704873.3333333335, ans=0.2 2023-11-27 03:24:50,586 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3704873.3333333335, ans=0.0 2023-11-27 03:25:07,321 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 555750 2023-11-27 03:25:10,352 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 2650, loss[loss=0.04444, simple_loss=0.05635, pruned_loss=0.006662, audio_tagging_loss=0.009604, over 16973.00 frames. ], tot_loss[loss=0.06501, simple_loss=0.08879, pruned_loss=0.01202, audio_tagging_loss=0.008593, over 3046899.06 frames. ], batch size: 63, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:25:11,621 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 03:26:02,924 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 555800 2023-11-27 03:26:06,302 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 2700, loss[loss=0.04854, simple_loss=0.06371, pruned_loss=0.006252, audio_tagging_loss=0.01043, over 14921.00 frames. ], tot_loss[loss=0.06454, simple_loss=0.08815, pruned_loss=0.01192, audio_tagging_loss=0.008541, over 3043132.36 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:26:13,771 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.275e+01 9.070e+01 9.755e+01 1.047e+02 1.495e+02, threshold=1.951e+02, percent-clipped=0.0 2023-11-27 03:26:33,239 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3705473.3333333335, ans=0.125 2023-11-27 03:26:58,382 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 555850 2023-11-27 03:27:01,187 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.20 vs. limit=15.0 2023-11-27 03:27:01,517 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 2750, loss[loss=0.07455, simple_loss=0.09572, pruned_loss=0.01663, audio_tagging_loss=0.01006, over 13835.00 frames. ], tot_loss[loss=0.06443, simple_loss=0.08801, pruned_loss=0.01188, audio_tagging_loss=0.008548, over 3038171.82 frames. ], batch size: 53, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:27:02,924 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3705673.3333333335, ans=0.125 2023-11-27 03:27:05,297 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.08 vs. limit=12.0 2023-11-27 03:27:49,883 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 03:27:54,254 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 555900 2023-11-27 03:27:55,820 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.25 vs. limit=12.0 2023-11-27 03:27:57,995 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 2800, loss[loss=0.06149, simple_loss=0.08973, pruned_loss=0.01081, audio_tagging_loss=0.005812, over 16121.00 frames. ], tot_loss[loss=0.06471, simple_loss=0.08835, pruned_loss=0.01202, audio_tagging_loss=0.008509, over 3033659.99 frames. ], batch size: 62, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 03:28:05,482 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.51 vs. limit=6.0 2023-11-27 03:28:05,912 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.107e+01 8.970e+01 9.604e+01 1.036e+02 1.276e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-27 03:28:12,038 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3706073.3333333335, ans=0.2 2023-11-27 03:28:40,525 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3706206.6666666665, ans=0.125 2023-11-27 03:28:48,856 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.53 vs. limit=15.0 2023-11-27 03:28:50,582 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 555950 2023-11-27 03:28:53,862 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.78 vs. limit=15.0 2023-11-27 03:28:54,312 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 2850, loss[loss=0.05719, simple_loss=0.07931, pruned_loss=0.009038, audio_tagging_loss=0.008498, over 14815.00 frames. ], tot_loss[loss=0.06436, simple_loss=0.08771, pruned_loss=0.01199, audio_tagging_loss=0.00851, over 3035532.94 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 03:29:03,046 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3706340.0, ans=0.09899494936611666 2023-11-27 03:29:04,151 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3706406.6666666665, ans=0.125 2023-11-27 03:29:14,715 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3706473.3333333335, ans=0.125 2023-11-27 03:29:19,323 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3706473.3333333335, ans=0.125 2023-11-27 03:29:26,864 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.89 vs. limit=15.0 2023-11-27 03:29:46,425 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 556000 2023-11-27 03:29:47,760 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-556000.pt 2023-11-27 03:29:51,988 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 2900, loss[loss=0.06872, simple_loss=0.09608, pruned_loss=0.01173, audio_tagging_loss=0.00896, over 15452.00 frames. ], tot_loss[loss=0.06487, simple_loss=0.08831, pruned_loss=0.01216, audio_tagging_loss=0.008556, over 3039559.89 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 03:29:59,419 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.629e+01 8.856e+01 9.574e+01 1.046e+02 1.351e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-27 03:30:03,345 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3706740.0, ans=0.125 2023-11-27 03:30:09,192 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3706740.0, ans=0.125 2023-11-27 03:30:16,021 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 03:30:20,292 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3706806.6666666665, ans=0.125 2023-11-27 03:30:26,117 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3706873.3333333335, ans=0.125 2023-11-27 03:30:38,865 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3706940.0, ans=0.125 2023-11-27 03:30:44,350 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 556050 2023-11-27 03:30:47,478 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 2950, loss[loss=0.0723, simple_loss=0.09727, pruned_loss=0.01368, audio_tagging_loss=0.009981, over 15736.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.08897, pruned_loss=0.01235, audio_tagging_loss=0.008571, over 3042309.71 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 03:31:06,100 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3707073.3333333335, ans=0.0 2023-11-27 03:31:10,519 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3707140.0, ans=0.125 2023-11-27 03:31:28,031 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3707206.6666666665, ans=0.0 2023-11-27 03:31:38,892 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3707273.3333333335, ans=0.125 2023-11-27 03:31:39,956 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3707273.3333333335, ans=0.1 2023-11-27 03:31:40,934 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 556100 2023-11-27 03:31:44,035 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 3000, loss[loss=0.05795, simple_loss=0.07748, pruned_loss=0.01007, audio_tagging_loss=0.00914, over 15365.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.0886, pruned_loss=0.01227, audio_tagging_loss=0.008697, over 3042848.46 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:31:44,037 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-27 03:32:16,620 INFO [train_asr.py:1267] (0/4) Epoch 47, validation: loss=0.05735, simple_loss=0.05053, pruned_loss=0.005352, audio_tagging_loss=0.02673, over 4681554.00 frames. 2023-11-27 03:32:16,621 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-27 03:32:16,830 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3707340.0, ans=0.05 2023-11-27 03:32:22,184 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3707340.0, ans=0.125 2023-11-27 03:32:25,490 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.754e+01 9.179e+01 9.770e+01 1.041e+02 1.490e+02, threshold=1.954e+02, percent-clipped=0.0 2023-11-27 03:32:30,563 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3707406.6666666665, ans=0.125 2023-11-27 03:32:45,085 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3707473.3333333335, ans=0.2 2023-11-27 03:32:54,013 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3707540.0, ans=0.0 2023-11-27 03:33:09,527 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 556150 2023-11-27 03:33:13,152 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 3050, loss[loss=0.06701, simple_loss=0.08375, pruned_loss=0.01256, audio_tagging_loss=0.01258, over 13787.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.08941, pruned_loss=0.01245, audio_tagging_loss=0.008791, over 3045694.90 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:33:15,601 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3707673.3333333335, ans=0.125 2023-11-27 03:33:45,055 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 03:34:02,919 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3707940.0, ans=0.125 2023-11-27 03:34:05,877 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 556200 2023-11-27 03:34:07,845 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=3707940.0, ans=22.5 2023-11-27 03:34:07,846 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.06 vs. limit=22.5 2023-11-27 03:34:09,375 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 3100, loss[loss=0.1005, simple_loss=0.1482, pruned_loss=0.01972, audio_tagging_loss=0.006662, over 15660.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.08961, pruned_loss=0.01244, audio_tagging_loss=0.008808, over 3044537.93 frames. ], batch size: 54, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:34:17,732 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.478e+01 9.052e+01 9.705e+01 1.059e+02 1.500e+02, threshold=1.941e+02, percent-clipped=0.0 2023-11-27 03:34:22,731 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3708073.3333333335, ans=0.0 2023-11-27 03:34:23,334 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.42 vs. limit=6.0 2023-11-27 03:34:40,190 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.32 vs. limit=15.0 2023-11-27 03:35:01,576 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 556250 2023-11-27 03:35:05,257 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 3150, loss[loss=0.07442, simple_loss=0.1065, pruned_loss=0.0129, audio_tagging_loss=0.008294, over 14912.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.08883, pruned_loss=0.0121, audio_tagging_loss=0.008895, over 3040847.15 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:35:10,731 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3708340.0, ans=0.04949747468305833 2023-11-27 03:35:10,796 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3708340.0, ans=0.125 2023-11-27 03:35:18,866 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3708406.6666666665, ans=0.0 2023-11-27 03:35:26,033 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3708406.6666666665, ans=0.125 2023-11-27 03:35:58,267 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 556300 2023-11-27 03:36:00,799 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.51 vs. limit=15.0 2023-11-27 03:36:01,314 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 3200, loss[loss=0.06268, simple_loss=0.08566, pruned_loss=0.01088, audio_tagging_loss=0.00896, over 15868.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.08876, pruned_loss=0.01203, audio_tagging_loss=0.008988, over 3043568.38 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 03:36:05,367 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.54 vs. limit=15.0 2023-11-27 03:36:07,256 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3708673.3333333335, ans=0.125 2023-11-27 03:36:10,710 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.678e+01 8.966e+01 9.584e+01 1.017e+02 1.282e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-27 03:36:13,995 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3708740.0, ans=0.125 2023-11-27 03:36:22,147 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.36 vs. limit=12.0 2023-11-27 03:36:22,578 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3708806.6666666665, ans=0.035 2023-11-27 03:36:33,760 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3708873.3333333335, ans=0.1 2023-11-27 03:36:54,334 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 556350 2023-11-27 03:36:57,423 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 3250, loss[loss=0.07458, simple_loss=0.1062, pruned_loss=0.01363, audio_tagging_loss=0.007864, over 14513.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.08927, pruned_loss=0.01208, audio_tagging_loss=0.009038, over 3044787.63 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:37:13,525 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3709073.3333333335, ans=0.1 2023-11-27 03:37:13,563 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3709073.3333333335, ans=0.2 2023-11-27 03:37:33,264 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3709206.6666666665, ans=0.0 2023-11-27 03:37:33,298 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff2.min_abs, batch_count=3709206.6666666665, ans=0.1 2023-11-27 03:37:39,078 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3709206.6666666665, ans=0.0 2023-11-27 03:37:42,367 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3709273.3333333335, ans=0.2 2023-11-27 03:37:44,469 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3709273.3333333335, ans=0.0 2023-11-27 03:37:49,587 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 556400 2023-11-27 03:37:52,937 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 3300, loss[loss=0.0662, simple_loss=0.08321, pruned_loss=0.01647, audio_tagging_loss=0.008131, over 14331.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.08946, pruned_loss=0.01215, audio_tagging_loss=0.009134, over 3040560.18 frames. ], batch size: 54, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:37:54,241 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3709340.0, ans=0.0 2023-11-27 03:38:01,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3709340.0, ans=0.1 2023-11-27 03:38:02,994 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.558e+01 9.106e+01 9.727e+01 1.041e+02 1.146e+02, threshold=1.945e+02, percent-clipped=0.0 2023-11-27 03:38:08,266 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.48 vs. limit=10.0 2023-11-27 03:38:11,641 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3709406.6666666665, ans=0.125 2023-11-27 03:38:16,094 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3709473.3333333335, ans=0.125 2023-11-27 03:38:20,031 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.30 vs. limit=12.0 2023-11-27 03:38:27,031 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3709540.0, ans=0.2 2023-11-27 03:38:43,726 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3709606.6666666665, ans=0.125 2023-11-27 03:38:45,557 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 556450 2023-11-27 03:38:49,252 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 3350, loss[loss=0.06812, simple_loss=0.09004, pruned_loss=0.009638, audio_tagging_loss=0.01346, over 15539.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.08988, pruned_loss=0.01213, audio_tagging_loss=0.008936, over 3049702.42 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:38:51,568 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3709673.3333333335, ans=0.2 2023-11-27 03:38:55,677 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.33 vs. limit=15.0 2023-11-27 03:39:04,785 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.88 vs. limit=6.0 2023-11-27 03:39:20,380 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3709806.6666666665, ans=0.125 2023-11-27 03:39:39,281 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3709940.0, ans=0.125 2023-11-27 03:39:42,792 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 556500 2023-11-27 03:39:45,906 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 3400, loss[loss=0.08689, simple_loss=0.127, pruned_loss=0.01629, audio_tagging_loss=0.007117, over 15701.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.09029, pruned_loss=0.01216, audio_tagging_loss=0.008763, over 3045886.35 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:39:47,076 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3710006.6666666665, ans=0.1 2023-11-27 03:39:50,410 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3710006.6666666665, ans=0.125 2023-11-27 03:39:55,331 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.632e+01 9.012e+01 9.564e+01 1.021e+02 1.293e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-27 03:40:05,042 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3710073.3333333335, ans=0.0 2023-11-27 03:40:19,351 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3710206.6666666665, ans=0.125 2023-11-27 03:40:30,616 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3710273.3333333335, ans=0.0 2023-11-27 03:40:36,052 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3710273.3333333335, ans=0.0 2023-11-27 03:40:38,008 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 556550 2023-11-27 03:40:41,077 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 3450, loss[loss=0.06513, simple_loss=0.08719, pruned_loss=0.01024, audio_tagging_loss=0.01129, over 15795.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.09032, pruned_loss=0.01214, audio_tagging_loss=0.008665, over 3052069.54 frames. ], batch size: 61, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:40:42,368 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3710340.0, ans=0.125 2023-11-27 03:40:42,487 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3710340.0, ans=0.0 2023-11-27 03:40:51,406 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.10 vs. limit=10.0 2023-11-27 03:41:01,431 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 03:41:07,610 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.30 vs. limit=15.0 2023-11-27 03:41:09,443 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3710473.3333333335, ans=0.0 2023-11-27 03:41:10,365 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3710473.3333333335, ans=0.125 2023-11-27 03:41:12,567 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3710473.3333333335, ans=0.125 2023-11-27 03:41:16,855 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3710540.0, ans=0.2 2023-11-27 03:41:32,656 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 556600 2023-11-27 03:41:36,599 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 3500, loss[loss=0.06485, simple_loss=0.09387, pruned_loss=0.01009, audio_tagging_loss=0.007827, over 14632.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.09019, pruned_loss=0.01201, audio_tagging_loss=0.008591, over 3047004.06 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:41:47,261 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.834e+01 8.850e+01 9.448e+01 1.007e+02 1.285e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-27 03:42:05,871 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 03:42:26,958 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 03:42:29,907 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 556650 2023-11-27 03:42:33,579 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 3550, loss[loss=0.07397, simple_loss=0.109, pruned_loss=0.01414, audio_tagging_loss=0.005334, over 15273.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08992, pruned_loss=0.01209, audio_tagging_loss=0.008597, over 3044688.70 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:42:34,055 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.21 vs. limit=22.5 2023-11-27 03:42:45,562 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3711073.3333333335, ans=0.125 2023-11-27 03:42:46,964 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.51 vs. limit=15.0 2023-11-27 03:42:59,254 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3711140.0, ans=0.125 2023-11-27 03:43:11,763 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3711206.6666666665, ans=0.0 2023-11-27 03:43:25,852 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 556700 2023-11-27 03:43:27,160 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3711273.3333333335, ans=0.125 2023-11-27 03:43:29,033 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 3600, loss[loss=0.04871, simple_loss=0.05725, pruned_loss=0.008857, audio_tagging_loss=0.01123, over 15096.00 frames. ], tot_loss[loss=0.06523, simple_loss=0.08931, pruned_loss=0.01205, audio_tagging_loss=0.008521, over 3044243.92 frames. ], batch size: 60, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 03:43:35,102 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.70 vs. limit=15.0 2023-11-27 03:43:38,312 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.03 vs. limit=22.5 2023-11-27 03:43:38,650 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.654e+01 8.614e+01 9.369e+01 1.008e+02 1.433e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-27 03:44:01,667 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3711540.0, ans=0.0 2023-11-27 03:44:10,194 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3711540.0, ans=0.1 2023-11-27 03:44:20,564 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 556750 2023-11-27 03:44:21,852 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3711606.6666666665, ans=0.0 2023-11-27 03:44:23,834 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 3650, loss[loss=0.06743, simple_loss=0.0969, pruned_loss=0.01325, audio_tagging_loss=0.005728, over 14972.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08986, pruned_loss=0.01226, audio_tagging_loss=0.008475, over 3045108.46 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 03:44:38,314 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3711740.0, ans=0.125 2023-11-27 03:44:42,567 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.97 vs. limit=15.0 2023-11-27 03:45:02,303 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3711873.3333333335, ans=0.125 2023-11-27 03:45:09,232 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3711940.0, ans=0.0 2023-11-27 03:45:13,470 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3711940.0, ans=0.04949747468305833 2023-11-27 03:45:15,106 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3711940.0, ans=0.2 2023-11-27 03:45:16,752 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3711940.0, ans=0.0 2023-11-27 03:45:17,583 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 556800 2023-11-27 03:45:20,975 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 3700, loss[loss=0.07426, simple_loss=0.09316, pruned_loss=0.01504, audio_tagging_loss=0.01264, over 15534.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.08987, pruned_loss=0.01226, audio_tagging_loss=0.008526, over 3046940.53 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 03:45:26,047 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3712006.6666666665, ans=0.1 2023-11-27 03:45:31,046 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.710e+01 9.060e+01 9.619e+01 1.026e+02 1.251e+02, threshold=1.924e+02, percent-clipped=0.0 2023-11-27 03:45:40,795 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3712073.3333333335, ans=0.0 2023-11-27 03:45:59,386 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=3712206.6666666665, ans=15.0 2023-11-27 03:46:01,681 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3712206.6666666665, ans=0.125 2023-11-27 03:46:13,704 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 556850 2023-11-27 03:46:16,766 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 3750, loss[loss=0.06436, simple_loss=0.09299, pruned_loss=0.01022, audio_tagging_loss=0.007645, over 15070.00 frames. ], tot_loss[loss=0.0666, simple_loss=0.09131, pruned_loss=0.01248, audio_tagging_loss=0.008469, over 3052870.61 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 03:46:17,024 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3712340.0, ans=0.0 2023-11-27 03:46:23,306 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3712340.0, ans=0.0 2023-11-27 03:46:25,294 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3712340.0, ans=0.2 2023-11-27 03:46:48,204 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.50 vs. limit=15.0 2023-11-27 03:46:54,972 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 03:47:08,947 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 556900 2023-11-27 03:47:12,066 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 3800, loss[loss=0.05834, simple_loss=0.0806, pruned_loss=0.009586, audio_tagging_loss=0.008451, over 13931.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.0899, pruned_loss=0.01217, audio_tagging_loss=0.008596, over 3057116.79 frames. ], batch size: 54, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:47:24,840 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.007e+01 9.077e+01 9.693e+01 1.049e+02 1.287e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-27 03:47:42,382 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.34 vs. limit=6.0 2023-11-27 03:47:45,242 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3712873.3333333335, ans=0.0 2023-11-27 03:47:55,924 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.66 vs. limit=15.0 2023-11-27 03:48:05,197 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 556950 2023-11-27 03:48:08,321 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 3850, loss[loss=0.06073, simple_loss=0.0796, pruned_loss=0.01019, audio_tagging_loss=0.01074, over 13828.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.08982, pruned_loss=0.01222, audio_tagging_loss=0.008649, over 3051123.33 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 8.0 2023-11-27 03:48:14,317 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=3713006.6666666665, ans=15.0 2023-11-27 03:48:19,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3713073.3333333335, ans=0.1 2023-11-27 03:48:35,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3713140.0, ans=0.1 2023-11-27 03:48:38,525 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.61 vs. limit=10.0 2023-11-27 03:48:48,986 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3713206.6666666665, ans=0.1 2023-11-27 03:48:49,951 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3713206.6666666665, ans=0.125 2023-11-27 03:49:01,539 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 557000 2023-11-27 03:49:05,033 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 3900, loss[loss=0.06922, simple_loss=0.09708, pruned_loss=0.01351, audio_tagging_loss=0.007179, over 16086.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.08981, pruned_loss=0.01228, audio_tagging_loss=0.008646, over 3048948.36 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 8.0 2023-11-27 03:49:06,275 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3713340.0, ans=0.125 2023-11-27 03:49:16,693 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.780e+01 9.076e+01 9.575e+01 1.020e+02 1.197e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-27 03:49:52,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3713606.6666666665, ans=0.07 2023-11-27 03:49:57,119 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 557050 2023-11-27 03:50:00,206 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 3950, loss[loss=0.06269, simple_loss=0.0862, pruned_loss=0.01187, audio_tagging_loss=0.007717, over 14659.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.08974, pruned_loss=0.01242, audio_tagging_loss=0.008737, over 3041567.45 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 8.0 2023-11-27 03:50:05,727 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3713673.3333333335, ans=0.125 2023-11-27 03:50:17,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3713740.0, ans=0.125 2023-11-27 03:50:21,304 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.43 vs. limit=22.5 2023-11-27 03:50:37,766 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.23 vs. limit=15.0 2023-11-27 03:50:44,637 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3713940.0, ans=0.0 2023-11-27 03:50:52,431 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 557100 2023-11-27 03:50:56,112 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 4000, loss[loss=0.06472, simple_loss=0.07796, pruned_loss=0.01439, audio_tagging_loss=0.01135, over 14652.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.08954, pruned_loss=0.01222, audio_tagging_loss=0.008851, over 3035470.65 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:51:08,814 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.068e+01 9.327e+01 9.767e+01 1.033e+02 1.414e+02, threshold=1.953e+02, percent-clipped=0.0 2023-11-27 03:51:33,901 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3714206.6666666665, ans=0.2 2023-11-27 03:51:34,959 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 03:51:48,494 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 557150 2023-11-27 03:51:52,053 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 4050, loss[loss=0.06308, simple_loss=0.08588, pruned_loss=0.01327, audio_tagging_loss=0.006865, over 14941.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.08977, pruned_loss=0.01233, audio_tagging_loss=0.008878, over 3034578.62 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:51:54,212 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 03:51:54,380 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3714340.0, ans=0.125 2023-11-27 03:52:25,565 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3714540.0, ans=0.125 2023-11-27 03:52:31,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3714540.0, ans=0.125 2023-11-27 03:52:38,929 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3714606.6666666665, ans=0.125 2023-11-27 03:52:44,068 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 557200 2023-11-27 03:52:47,541 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 4100, loss[loss=0.05786, simple_loss=0.07163, pruned_loss=0.008226, audio_tagging_loss=0.01382, over 15511.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08946, pruned_loss=0.0122, audio_tagging_loss=0.008927, over 3039970.29 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:52:59,707 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.067e+01 9.062e+01 9.739e+01 1.030e+02 1.331e+02, threshold=1.948e+02, percent-clipped=0.0 2023-11-27 03:53:02,724 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3714740.0, ans=0.0 2023-11-27 03:53:22,021 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3714873.3333333335, ans=0.0 2023-11-27 03:53:25,148 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3714873.3333333335, ans=0.125 2023-11-27 03:53:29,394 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3714873.3333333335, ans=0.0 2023-11-27 03:53:40,363 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 557250 2023-11-27 03:53:43,487 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 4150, loss[loss=0.05788, simple_loss=0.07733, pruned_loss=0.009605, audio_tagging_loss=0.009612, over 15245.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08971, pruned_loss=0.01215, audio_tagging_loss=0.008769, over 3038187.86 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:53:59,842 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3715073.3333333335, ans=0.125 2023-11-27 03:54:00,754 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3715073.3333333335, ans=0.125 2023-11-27 03:54:05,637 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3715140.0, ans=0.07 2023-11-27 03:54:14,241 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3715140.0, ans=0.125 2023-11-27 03:54:24,030 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 03:54:36,715 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 557300 2023-11-27 03:54:39,299 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.42 vs. limit=6.0 2023-11-27 03:54:39,857 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 4200, loss[loss=0.1124, simple_loss=0.1564, pruned_loss=0.02856, audio_tagging_loss=0.005661, over 14787.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.08961, pruned_loss=0.01213, audio_tagging_loss=0.008623, over 3038722.04 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:54:40,115 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3715340.0, ans=0.125 2023-11-27 03:54:51,051 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3715406.6666666665, ans=0.125 2023-11-27 03:54:51,931 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.295e+01 8.957e+01 9.619e+01 1.045e+02 2.364e+02, threshold=1.924e+02, percent-clipped=1.0 2023-11-27 03:54:54,291 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3715406.6666666665, ans=0.125 2023-11-27 03:55:02,098 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=3715473.3333333335, ans=6.0 2023-11-27 03:55:02,858 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3715473.3333333335, ans=0.125 2023-11-27 03:55:04,209 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.72 vs. limit=15.0 2023-11-27 03:55:11,553 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3715473.3333333335, ans=0.125 2023-11-27 03:55:20,641 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3715540.0, ans=0.0 2023-11-27 03:55:24,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3715606.6666666665, ans=0.125 2023-11-27 03:55:32,775 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 557350 2023-11-27 03:55:35,910 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 4250, loss[loss=0.08546, simple_loss=0.129, pruned_loss=0.01506, audio_tagging_loss=0.005914, over 15583.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08946, pruned_loss=0.01218, audio_tagging_loss=0.008608, over 3040125.87 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:55:38,422 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3715673.3333333335, ans=0.2 2023-11-27 03:55:38,803 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.whiten.whitening_limit, batch_count=3715673.3333333335, ans=15.0 2023-11-27 03:55:44,764 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3715673.3333333335, ans=0.0 2023-11-27 03:55:52,845 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3715740.0, ans=0.0 2023-11-27 03:55:54,421 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3715740.0, ans=0.0 2023-11-27 03:55:56,593 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3715740.0, ans=0.2 2023-11-27 03:55:58,078 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.23 vs. limit=6.0 2023-11-27 03:56:08,758 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3715873.3333333335, ans=0.0 2023-11-27 03:56:17,038 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.32 vs. limit=12.0 2023-11-27 03:56:18,845 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3715873.3333333335, ans=0.125 2023-11-27 03:56:20,529 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.42 vs. limit=8.0 2023-11-27 03:56:28,284 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 557400 2023-11-27 03:56:28,367 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3715940.0, ans=0.0 2023-11-27 03:56:31,982 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 4300, loss[loss=0.06831, simple_loss=0.09709, pruned_loss=0.01048, audio_tagging_loss=0.009283, over 15264.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08931, pruned_loss=0.01213, audio_tagging_loss=0.008573, over 3042537.71 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:56:32,212 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3716006.6666666665, ans=0.125 2023-11-27 03:56:42,966 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.76 vs. limit=12.0 2023-11-27 03:56:44,662 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.985e+01 9.063e+01 9.742e+01 1.048e+02 1.434e+02, threshold=1.948e+02, percent-clipped=0.0 2023-11-27 03:56:46,864 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3716073.3333333335, ans=0.125 2023-11-27 03:56:50,079 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3716073.3333333335, ans=0.125 2023-11-27 03:57:00,062 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3716140.0, ans=0.1 2023-11-27 03:57:24,953 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 557450 2023-11-27 03:57:28,060 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 4350, loss[loss=0.05337, simple_loss=0.06513, pruned_loss=0.009655, audio_tagging_loss=0.01115, over 14361.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08979, pruned_loss=0.01213, audio_tagging_loss=0.008587, over 3037916.67 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:58:01,965 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.27 vs. limit=15.0 2023-11-27 03:58:13,899 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3716606.6666666665, ans=0.1 2023-11-27 03:58:17,044 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3716606.6666666665, ans=0.125 2023-11-27 03:58:20,065 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 557500 2023-11-27 03:58:23,186 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 4400, loss[loss=0.0763, simple_loss=0.1089, pruned_loss=0.01364, audio_tagging_loss=0.008193, over 15499.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.09, pruned_loss=0.01204, audio_tagging_loss=0.008507, over 3040307.68 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 03:58:31,291 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3716673.3333333335, ans=0.125 2023-11-27 03:58:35,347 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.054e+01 9.094e+01 9.740e+01 1.025e+02 1.251e+02, threshold=1.948e+02, percent-clipped=0.0 2023-11-27 03:58:36,734 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3716740.0, ans=0.0 2023-11-27 03:58:40,320 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3716740.0, ans=0.0 2023-11-27 03:58:46,075 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3716806.6666666665, ans=0.2 2023-11-27 03:59:15,452 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 557550 2023-11-27 03:59:17,740 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3717006.6666666665, ans=0.09899494936611666 2023-11-27 03:59:18,574 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 4450, loss[loss=0.06519, simple_loss=0.09575, pruned_loss=0.01141, audio_tagging_loss=0.005904, over 14576.00 frames. ], tot_loss[loss=0.06496, simple_loss=0.0891, pruned_loss=0.01192, audio_tagging_loss=0.008488, over 3038872.46 frames. ], batch size: 54, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 03:59:18,763 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3717006.6666666665, ans=0.0 2023-11-27 03:59:39,395 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3717073.3333333335, ans=0.0 2023-11-27 04:00:11,596 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 557600 2023-11-27 04:00:15,456 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 4500, loss[loss=0.06021, simple_loss=0.08411, pruned_loss=0.008946, audio_tagging_loss=0.009204, over 15907.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08972, pruned_loss=0.01195, audio_tagging_loss=0.008482, over 3044678.87 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 04:00:27,190 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.280e+01 9.128e+01 9.724e+01 1.026e+02 1.221e+02, threshold=1.945e+02, percent-clipped=0.0 2023-11-27 04:00:35,909 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3717473.3333333335, ans=0.05 2023-11-27 04:00:45,629 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3717473.3333333335, ans=0.125 2023-11-27 04:00:56,430 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.73 vs. limit=15.0 2023-11-27 04:00:58,475 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3717540.0, ans=0.2 2023-11-27 04:01:00,999 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.02 vs. limit=15.0 2023-11-27 04:01:07,861 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 557650 2023-11-27 04:01:11,018 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 4550, loss[loss=0.0412, simple_loss=0.05142, pruned_loss=0.00555, audio_tagging_loss=0.009942, over 16375.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.08978, pruned_loss=0.01202, audio_tagging_loss=0.008471, over 3042321.13 frames. ], batch size: 63, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 04:01:40,242 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3717806.6666666665, ans=0.125 2023-11-27 04:01:47,615 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3717873.3333333335, ans=0.0 2023-11-27 04:01:53,976 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 04:01:55,254 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3717940.0, ans=0.0 2023-11-27 04:02:03,750 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 557700 2023-11-27 04:02:07,393 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 4600, loss[loss=0.08379, simple_loss=0.1167, pruned_loss=0.01725, audio_tagging_loss=0.00818, over 15158.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.0901, pruned_loss=0.01197, audio_tagging_loss=0.008416, over 3041729.31 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 04:02:13,580 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3718006.6666666665, ans=0.125 2023-11-27 04:02:20,756 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.735e+01 9.058e+01 9.556e+01 1.027e+02 1.489e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-27 04:02:31,612 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3718140.0, ans=0.0 2023-11-27 04:03:00,541 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 557750 2023-11-27 04:03:02,753 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3718340.0, ans=0.1 2023-11-27 04:03:04,148 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 4650, loss[loss=0.07035, simple_loss=0.09743, pruned_loss=0.0139, audio_tagging_loss=0.007735, over 15098.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08962, pruned_loss=0.01192, audio_tagging_loss=0.008593, over 3046477.52 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 04:03:10,108 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.79 vs. limit=15.0 2023-11-27 04:03:28,896 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3718473.3333333335, ans=0.125 2023-11-27 04:03:39,468 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3718540.0, ans=0.0 2023-11-27 04:03:44,233 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3718540.0, ans=0.125 2023-11-27 04:03:51,152 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3718606.6666666665, ans=0.2 2023-11-27 04:03:52,051 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3718606.6666666665, ans=0.125 2023-11-27 04:03:56,140 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 557800 2023-11-27 04:03:59,560 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 4700, loss[loss=0.06741, simple_loss=0.08595, pruned_loss=0.01349, audio_tagging_loss=0.01095, over 15199.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.08943, pruned_loss=0.01197, audio_tagging_loss=0.008749, over 3044192.98 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:04:00,912 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3718673.3333333335, ans=0.0 2023-11-27 04:04:02,935 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3718673.3333333335, ans=0.125 2023-11-27 04:04:11,384 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3718740.0, ans=0.1 2023-11-27 04:04:12,170 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.864e+01 9.072e+01 9.943e+01 1.043e+02 1.382e+02, threshold=1.989e+02, percent-clipped=0.0 2023-11-27 04:04:20,462 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3718806.6666666665, ans=0.2 2023-11-27 04:04:45,103 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.43 vs. limit=15.0 2023-11-27 04:04:48,995 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3718940.0, ans=0.125 2023-11-27 04:04:50,035 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3718940.0, ans=0.1 2023-11-27 04:04:51,008 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 557850 2023-11-27 04:04:54,127 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 4750, loss[loss=0.04614, simple_loss=0.05808, pruned_loss=0.007172, audio_tagging_loss=0.009926, over 15002.00 frames. ], tot_loss[loss=0.06506, simple_loss=0.08893, pruned_loss=0.01177, audio_tagging_loss=0.008816, over 3051903.78 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:05:00,818 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 04:05:10,173 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 04:05:10,660 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.43 vs. limit=10.0 2023-11-27 04:05:13,520 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=3719073.3333333335, ans=15.0 2023-11-27 04:05:33,549 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3719206.6666666665, ans=0.125 2023-11-27 04:05:40,845 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3719273.3333333335, ans=0.0 2023-11-27 04:05:45,591 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.20 vs. limit=22.5 2023-11-27 04:05:47,630 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 557900 2023-11-27 04:05:48,793 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 04:05:50,732 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 4800, loss[loss=0.05908, simple_loss=0.07824, pruned_loss=0.007845, audio_tagging_loss=0.01212, over 15449.00 frames. ], tot_loss[loss=0.06528, simple_loss=0.08899, pruned_loss=0.01187, audio_tagging_loss=0.008912, over 3050927.59 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 04:05:51,209 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.48 vs. limit=15.0 2023-11-27 04:05:59,986 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3719340.0, ans=0.2 2023-11-27 04:06:05,053 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.356e+01 9.052e+01 9.526e+01 1.032e+02 1.738e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-27 04:06:08,500 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3719406.6666666665, ans=0.125 2023-11-27 04:06:41,197 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3719606.6666666665, ans=0.0 2023-11-27 04:06:43,259 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 557950 2023-11-27 04:06:46,397 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 4850, loss[loss=0.05972, simple_loss=0.08304, pruned_loss=0.009085, audio_tagging_loss=0.009121, over 17022.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.08973, pruned_loss=0.01193, audio_tagging_loss=0.0089, over 3056220.17 frames. ], batch size: 66, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:06:54,081 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3719673.3333333335, ans=0.125 2023-11-27 04:07:04,091 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.65 vs. limit=15.0 2023-11-27 04:07:05,752 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3719740.0, ans=0.2 2023-11-27 04:07:13,236 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3719806.6666666665, ans=0.2 2023-11-27 04:07:15,354 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3719806.6666666665, ans=0.125 2023-11-27 04:07:18,177 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.79 vs. limit=15.0 2023-11-27 04:07:28,949 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.65 vs. limit=22.5 2023-11-27 04:07:38,010 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 558000 2023-11-27 04:07:41,444 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 4900, loss[loss=0.06376, simple_loss=0.09282, pruned_loss=0.009336, audio_tagging_loss=0.00801, over 15834.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.09011, pruned_loss=0.01196, audio_tagging_loss=0.008737, over 3053123.65 frames. ], batch size: 61, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:07:43,664 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3720006.6666666665, ans=0.2 2023-11-27 04:07:45,941 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3720006.6666666665, ans=0.125 2023-11-27 04:07:56,762 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.179e+01 8.861e+01 9.533e+01 1.009e+02 1.253e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-27 04:08:03,863 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3720140.0, ans=0.125 2023-11-27 04:08:22,974 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3720206.6666666665, ans=0.125 2023-11-27 04:08:27,239 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3720273.3333333335, ans=0.0 2023-11-27 04:08:27,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3720273.3333333335, ans=0.0 2023-11-27 04:08:33,867 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 558050 2023-11-27 04:08:37,515 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 4950, loss[loss=0.07577, simple_loss=0.1042, pruned_loss=0.01481, audio_tagging_loss=0.008874, over 14604.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.09026, pruned_loss=0.01196, audio_tagging_loss=0.008648, over 3054855.48 frames. ], batch size: 54, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:08:39,478 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.88 vs. limit=12.0 2023-11-27 04:08:46,203 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3720340.0, ans=0.1 2023-11-27 04:09:10,727 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3720540.0, ans=0.125 2023-11-27 04:09:31,060 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 558100 2023-11-27 04:09:34,158 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 5000, loss[loss=0.05398, simple_loss=0.07812, pruned_loss=0.00829, audio_tagging_loss=0.006631, over 15074.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.09027, pruned_loss=0.01207, audio_tagging_loss=0.008517, over 3051774.68 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:09:42,928 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3720673.3333333335, ans=0.125 2023-11-27 04:09:46,018 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3720740.0, ans=0.125 2023-11-27 04:09:47,968 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.742e+01 8.780e+01 9.541e+01 1.005e+02 1.452e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-27 04:10:04,288 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3720806.6666666665, ans=0.0 2023-11-27 04:10:08,986 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3720873.3333333335, ans=0.125 2023-11-27 04:10:25,997 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 558150 2023-11-27 04:10:29,097 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 5050, loss[loss=0.05967, simple_loss=0.08532, pruned_loss=0.01114, audio_tagging_loss=0.005867, over 16281.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08996, pruned_loss=0.01211, audio_tagging_loss=0.008449, over 3053149.74 frames. ], batch size: 63, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:10:31,489 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3721006.6666666665, ans=0.125 2023-11-27 04:10:39,816 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.45 vs. limit=15.0 2023-11-27 04:11:21,956 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 558200 2023-11-27 04:11:25,288 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 5100, loss[loss=0.06615, simple_loss=0.09863, pruned_loss=0.01144, audio_tagging_loss=0.005393, over 15882.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08996, pruned_loss=0.0119, audio_tagging_loss=0.008409, over 3056902.44 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:11:40,665 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.478e+01 8.877e+01 9.521e+01 1.045e+02 1.362e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-27 04:12:17,548 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3721606.6666666665, ans=0.1 2023-11-27 04:12:19,043 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 558250 2023-11-27 04:12:22,228 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 5150, loss[loss=0.06276, simple_loss=0.0772, pruned_loss=0.01626, audio_tagging_loss=0.007899, over 15804.00 frames. ], tot_loss[loss=0.06452, simple_loss=0.08895, pruned_loss=0.01166, audio_tagging_loss=0.008392, over 3058736.40 frames. ], batch size: 61, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:12:26,113 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.95 vs. limit=15.0 2023-11-27 04:12:28,736 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 04:12:30,753 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3721673.3333333335, ans=0.125 2023-11-27 04:12:55,223 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3721873.3333333335, ans=0.07 2023-11-27 04:13:03,751 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3721873.3333333335, ans=0.1 2023-11-27 04:13:14,244 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 558300 2023-11-27 04:13:16,528 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3722006.6666666665, ans=0.1 2023-11-27 04:13:17,295 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 5200, loss[loss=0.05626, simple_loss=0.07628, pruned_loss=0.006489, audio_tagging_loss=0.01163, over 15425.00 frames. ], tot_loss[loss=0.06466, simple_loss=0.08895, pruned_loss=0.01172, audio_tagging_loss=0.008474, over 3056620.39 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 04:13:31,585 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.764e+01 8.871e+01 9.548e+01 1.019e+02 1.156e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-27 04:13:33,892 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3722073.3333333335, ans=0.1 2023-11-27 04:13:37,771 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3722073.3333333335, ans=0.125 2023-11-27 04:13:58,324 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.40 vs. limit=15.0 2023-11-27 04:14:09,361 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 558350 2023-11-27 04:14:09,587 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3722273.3333333335, ans=0.125 2023-11-27 04:14:13,083 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 5250, loss[loss=0.05595, simple_loss=0.07556, pruned_loss=0.008826, audio_tagging_loss=0.009346, over 16511.00 frames. ], tot_loss[loss=0.06495, simple_loss=0.08937, pruned_loss=0.01179, audio_tagging_loss=0.008476, over 3057769.82 frames. ], batch size: 61, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 04:14:18,558 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.61 vs. limit=15.0 2023-11-27 04:14:39,364 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3722473.3333333335, ans=0.1 2023-11-27 04:14:45,663 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 04:14:46,560 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3722540.0, ans=0.125 2023-11-27 04:14:46,734 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3722540.0, ans=0.125 2023-11-27 04:14:48,172 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.63 vs. limit=22.5 2023-11-27 04:15:02,439 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3722606.6666666665, ans=0.0 2023-11-27 04:15:05,995 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 558400 2023-11-27 04:15:09,454 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 5300, loss[loss=0.08474, simple_loss=0.1346, pruned_loss=0.0132, audio_tagging_loss=0.004245, over 16671.00 frames. ], tot_loss[loss=0.0648, simple_loss=0.0889, pruned_loss=0.01183, audio_tagging_loss=0.008521, over 3048967.23 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:15:24,824 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.425e+01 9.122e+01 9.674e+01 1.051e+02 1.467e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-27 04:15:35,226 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.08 vs. limit=22.5 2023-11-27 04:15:43,309 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3722873.3333333335, ans=0.0 2023-11-27 04:15:55,580 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 04:16:02,273 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 558450 2023-11-27 04:16:05,423 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 5350, loss[loss=0.0563, simple_loss=0.0732, pruned_loss=0.008392, audio_tagging_loss=0.01131, over 14647.00 frames. ], tot_loss[loss=0.0648, simple_loss=0.08893, pruned_loss=0.01176, audio_tagging_loss=0.008576, over 3043600.72 frames. ], batch size: 54, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:16:13,870 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 04:16:46,884 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3723206.6666666665, ans=0.125 2023-11-27 04:16:51,133 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3723273.3333333335, ans=0.1 2023-11-27 04:16:52,130 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3723273.3333333335, ans=0.1 2023-11-27 04:16:54,510 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.84 vs. limit=6.0 2023-11-27 04:16:57,324 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 558500 2023-11-27 04:17:00,458 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 5400, loss[loss=0.06533, simple_loss=0.09079, pruned_loss=0.007741, audio_tagging_loss=0.01219, over 15441.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.09031, pruned_loss=0.012, audio_tagging_loss=0.008548, over 3041339.63 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:17:16,908 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.704e+01 9.000e+01 9.552e+01 1.029e+02 2.043e+02, threshold=1.910e+02, percent-clipped=1.0 2023-11-27 04:17:24,556 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3723473.3333333335, ans=0.0 2023-11-27 04:17:42,212 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3723540.0, ans=0.125 2023-11-27 04:17:46,975 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3723606.6666666665, ans=0.0 2023-11-27 04:17:53,670 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 558550 2023-11-27 04:17:57,040 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.99 vs. limit=15.0 2023-11-27 04:17:57,349 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 5450, loss[loss=0.06505, simple_loss=0.09629, pruned_loss=0.0111, audio_tagging_loss=0.005801, over 14336.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.08922, pruned_loss=0.01193, audio_tagging_loss=0.008626, over 3037442.89 frames. ], batch size: 54, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:18:01,262 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.03 vs. limit=15.0 2023-11-27 04:18:11,658 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3723740.0, ans=0.0 2023-11-27 04:18:11,945 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.72 vs. limit=22.5 2023-11-27 04:18:13,858 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3723740.0, ans=0.0 2023-11-27 04:18:21,738 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.06 vs. limit=15.0 2023-11-27 04:18:29,317 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3723873.3333333335, ans=0.05 2023-11-27 04:18:31,395 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3723873.3333333335, ans=0.125 2023-11-27 04:18:33,889 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3723873.3333333335, ans=0.125 2023-11-27 04:18:37,431 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.80 vs. limit=22.5 2023-11-27 04:18:40,805 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3723940.0, ans=0.125 2023-11-27 04:18:49,208 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 558600 2023-11-27 04:18:52,633 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 5500, loss[loss=0.06643, simple_loss=0.08629, pruned_loss=0.01339, audio_tagging_loss=0.009896, over 14273.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08963, pruned_loss=0.01228, audio_tagging_loss=0.008665, over 3039178.95 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:18:55,493 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3724006.6666666665, ans=0.125 2023-11-27 04:19:07,815 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.611e+01 9.127e+01 9.786e+01 1.057e+02 1.357e+02, threshold=1.957e+02, percent-clipped=0.0 2023-11-27 04:19:37,171 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.82 vs. limit=22.5 2023-11-27 04:19:41,092 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3724273.3333333335, ans=0.2 2023-11-27 04:19:45,213 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 558650 2023-11-27 04:19:46,792 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.42 vs. limit=22.5 2023-11-27 04:19:48,351 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 5550, loss[loss=0.07307, simple_loss=0.1005, pruned_loss=0.01375, audio_tagging_loss=0.009057, over 14874.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.08908, pruned_loss=0.01211, audio_tagging_loss=0.008807, over 3043705.55 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:20:11,368 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3724473.3333333335, ans=0.125 2023-11-27 04:20:20,774 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.77 vs. limit=15.0 2023-11-27 04:20:32,413 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.30 vs. limit=15.0 2023-11-27 04:20:33,231 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3724606.6666666665, ans=0.0 2023-11-27 04:20:41,513 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 558700 2023-11-27 04:20:44,609 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 5600, loss[loss=0.05174, simple_loss=0.06407, pruned_loss=0.009285, audio_tagging_loss=0.01042, over 14335.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08886, pruned_loss=0.01204, audio_tagging_loss=0.008848, over 3053121.69 frames. ], batch size: 54, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 04:20:53,840 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3724673.3333333335, ans=0.0 2023-11-27 04:20:59,181 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3724740.0, ans=0.125 2023-11-27 04:21:00,061 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.072e+01 9.029e+01 9.775e+01 1.051e+02 1.247e+02, threshold=1.955e+02, percent-clipped=0.0 2023-11-27 04:21:11,175 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3724806.6666666665, ans=0.125 2023-11-27 04:21:23,807 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 04:21:25,081 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3724873.3333333335, ans=0.125 2023-11-27 04:21:26,118 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3724873.3333333335, ans=0.025 2023-11-27 04:21:37,132 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 558750 2023-11-27 04:21:40,273 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 5650, loss[loss=0.06589, simple_loss=0.07746, pruned_loss=0.01383, audio_tagging_loss=0.01333, over 16151.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08873, pruned_loss=0.01202, audio_tagging_loss=0.008911, over 3047857.33 frames. ], batch size: 61, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:21:44,661 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3725006.6666666665, ans=0.125 2023-11-27 04:21:47,765 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3725006.6666666665, ans=0.125 2023-11-27 04:21:49,442 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3725006.6666666665, ans=0.125 2023-11-27 04:21:56,960 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3725073.3333333335, ans=0.1 2023-11-27 04:22:08,619 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3725140.0, ans=0.125 2023-11-27 04:22:16,668 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3725206.6666666665, ans=10.0 2023-11-27 04:22:17,602 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3725206.6666666665, ans=0.0 2023-11-27 04:22:33,013 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 558800 2023-11-27 04:22:36,435 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 5700, loss[loss=0.04865, simple_loss=0.06642, pruned_loss=0.005872, audio_tagging_loss=0.009566, over 14784.00 frames. ], tot_loss[loss=0.06523, simple_loss=0.08883, pruned_loss=0.012, audio_tagging_loss=0.00882, over 3051024.45 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:22:44,154 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.08 vs. limit=6.0 2023-11-27 04:22:53,742 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.360e+01 9.100e+01 9.627e+01 1.012e+02 1.597e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-27 04:22:57,058 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3725406.6666666665, ans=0.1 2023-11-27 04:23:01,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3725473.3333333335, ans=0.07 2023-11-27 04:23:21,255 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.35 vs. limit=15.0 2023-11-27 04:23:23,130 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3725606.6666666665, ans=0.125 2023-11-27 04:23:26,879 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3725606.6666666665, ans=0.2 2023-11-27 04:23:28,770 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 558850 2023-11-27 04:23:32,480 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 5750, loss[loss=0.07657, simple_loss=0.1039, pruned_loss=0.01527, audio_tagging_loss=0.009379, over 15422.00 frames. ], tot_loss[loss=0.06495, simple_loss=0.08871, pruned_loss=0.01191, audio_tagging_loss=0.008685, over 3046819.22 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:23:40,772 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3725673.3333333335, ans=0.125 2023-11-27 04:23:45,079 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3725740.0, ans=0.5 2023-11-27 04:23:46,600 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.87 vs. limit=15.0 2023-11-27 04:24:07,320 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.26 vs. limit=10.0 2023-11-27 04:24:25,214 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 558900 2023-11-27 04:24:28,364 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 5800, loss[loss=0.05136, simple_loss=0.06331, pruned_loss=0.009911, audio_tagging_loss=0.009799, over 15287.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08935, pruned_loss=0.01212, audio_tagging_loss=0.008549, over 3045581.56 frames. ], batch size: 61, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:24:44,055 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.198e+01 8.898e+01 9.503e+01 1.014e+02 1.698e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-27 04:24:53,857 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 04:25:10,842 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3726206.6666666665, ans=0.2 2023-11-27 04:25:20,297 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 558950 2023-11-27 04:25:23,511 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 5850, loss[loss=0.06693, simple_loss=0.09448, pruned_loss=0.01242, audio_tagging_loss=0.007262, over 13900.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08926, pruned_loss=0.01212, audio_tagging_loss=0.008511, over 3038260.74 frames. ], batch size: 53, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:25:45,003 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3726406.6666666665, ans=0.125 2023-11-27 04:25:54,531 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3726473.3333333335, ans=0.95 2023-11-27 04:25:58,993 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.29 vs. limit=15.0 2023-11-27 04:25:59,915 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3726540.0, ans=0.125 2023-11-27 04:26:04,245 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3726540.0, ans=0.1 2023-11-27 04:26:15,332 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3726606.6666666665, ans=0.125 2023-11-27 04:26:16,685 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 559000 2023-11-27 04:26:16,857 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3726606.6666666665, ans=0.0 2023-11-27 04:26:20,664 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 5900, loss[loss=0.08148, simple_loss=0.1136, pruned_loss=0.0153, audio_tagging_loss=0.009376, over 15362.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08986, pruned_loss=0.01212, audio_tagging_loss=0.008479, over 3038513.66 frames. ], batch size: 54, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:26:30,364 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.26 vs. limit=12.0 2023-11-27 04:26:37,154 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.616e+01 9.105e+01 9.736e+01 1.055e+02 1.471e+02, threshold=1.947e+02, percent-clipped=0.0 2023-11-27 04:26:40,557 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3726740.0, ans=0.125 2023-11-27 04:27:13,045 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 559050 2023-11-27 04:27:16,236 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 5950, loss[loss=0.06676, simple_loss=0.08768, pruned_loss=0.01274, audio_tagging_loss=0.01018, over 15524.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.08982, pruned_loss=0.01202, audio_tagging_loss=0.008484, over 3047312.26 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:27:19,689 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3727006.6666666665, ans=0.125 2023-11-27 04:27:19,724 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3727006.6666666665, ans=0.125 2023-11-27 04:27:29,202 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3727073.3333333335, ans=0.1 2023-11-27 04:27:35,853 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.24 vs. limit=10.0 2023-11-27 04:28:07,739 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 559100 2023-11-27 04:28:10,860 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 6000, loss[loss=0.05523, simple_loss=0.08217, pruned_loss=0.006031, audio_tagging_loss=0.008118, over 15443.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08965, pruned_loss=0.01199, audio_tagging_loss=0.008456, over 3043043.32 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 04:28:10,862 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-27 04:28:43,426 INFO [train_asr.py:1267] (0/4) Epoch 47, validation: loss=0.05733, simple_loss=0.05048, pruned_loss=0.005338, audio_tagging_loss=0.02675, over 4681554.00 frames. 2023-11-27 04:28:43,426 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-27 04:28:59,632 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.659e+01 9.039e+01 9.599e+01 1.058e+02 1.819e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-27 04:29:04,646 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.84 vs. limit=10.0 2023-11-27 04:29:08,301 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3727473.3333333335, ans=0.125 2023-11-27 04:29:08,304 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3727473.3333333335, ans=0.2 2023-11-27 04:29:18,111 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.09 vs. limit=12.0 2023-11-27 04:29:21,492 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 04:29:24,837 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3727540.0, ans=0.125 2023-11-27 04:29:35,317 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.77 vs. limit=15.0 2023-11-27 04:29:35,847 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 559150 2023-11-27 04:29:39,053 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 6050, loss[loss=0.06714, simple_loss=0.08521, pruned_loss=0.01378, audio_tagging_loss=0.01075, over 15357.00 frames. ], tot_loss[loss=0.06484, simple_loss=0.08911, pruned_loss=0.0118, audio_tagging_loss=0.00848, over 3045803.57 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 04:29:55,604 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.57 vs. limit=15.0 2023-11-27 04:30:01,724 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3727806.6666666665, ans=0.125 2023-11-27 04:30:27,122 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3727940.0, ans=0.1 2023-11-27 04:30:31,212 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 559200 2023-11-27 04:30:34,600 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 6100, loss[loss=0.07274, simple_loss=0.09561, pruned_loss=0.01276, audio_tagging_loss=0.01217, over 16046.00 frames. ], tot_loss[loss=0.06442, simple_loss=0.08831, pruned_loss=0.01174, audio_tagging_loss=0.00853, over 3053315.05 frames. ], batch size: 61, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:30:35,990 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3728006.6666666665, ans=0.125 2023-11-27 04:30:40,034 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3728006.6666666665, ans=0.0 2023-11-27 04:30:46,901 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3728073.3333333335, ans=0.125 2023-11-27 04:30:53,479 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.728e+01 8.839e+01 9.378e+01 9.986e+01 1.390e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-27 04:30:57,928 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3728140.0, ans=0.125 2023-11-27 04:30:58,896 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3728140.0, ans=0.125 2023-11-27 04:31:26,135 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 559250 2023-11-27 04:31:30,229 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 6150, loss[loss=0.07638, simple_loss=0.105, pruned_loss=0.01672, audio_tagging_loss=0.007169, over 14535.00 frames. ], tot_loss[loss=0.06453, simple_loss=0.08859, pruned_loss=0.01166, audio_tagging_loss=0.008573, over 3054605.92 frames. ], batch size: 53, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:31:30,528 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3728340.0, ans=0.2 2023-11-27 04:31:48,356 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3728406.6666666665, ans=0.2 2023-11-27 04:32:08,377 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3728540.0, ans=0.125 2023-11-27 04:32:10,450 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3728540.0, ans=0.0 2023-11-27 04:32:15,569 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.71 vs. limit=15.0 2023-11-27 04:32:22,918 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 559300 2023-11-27 04:32:26,097 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 6200, loss[loss=0.0782, simple_loss=0.1148, pruned_loss=0.01347, audio_tagging_loss=0.007326, over 15566.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08953, pruned_loss=0.01183, audio_tagging_loss=0.008642, over 3054491.23 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:32:42,997 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.527e+01 8.961e+01 9.532e+01 1.029e+02 1.294e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-27 04:32:51,175 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.65 vs. limit=15.0 2023-11-27 04:33:12,777 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3728940.0, ans=0.0 2023-11-27 04:33:17,908 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 559350 2023-11-27 04:33:21,002 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 6250, loss[loss=0.0695, simple_loss=0.09713, pruned_loss=0.012, audio_tagging_loss=0.00894, over 14926.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.08931, pruned_loss=0.012, audio_tagging_loss=0.008681, over 3052893.64 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:33:25,632 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3729006.6666666665, ans=0.0 2023-11-27 04:33:34,251 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3729073.3333333335, ans=0.1 2023-11-27 04:33:52,162 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3729140.0, ans=0.125 2023-11-27 04:33:55,434 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3729206.6666666665, ans=0.125 2023-11-27 04:33:58,133 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.06 vs. limit=15.0 2023-11-27 04:34:08,466 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.26 vs. limit=22.5 2023-11-27 04:34:13,108 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 559400 2023-11-27 04:34:16,470 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 6300, loss[loss=0.08058, simple_loss=0.1197, pruned_loss=0.01423, audio_tagging_loss=0.006488, over 16135.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.09022, pruned_loss=0.01216, audio_tagging_loss=0.008778, over 3054439.78 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:34:35,449 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.760e+01 8.982e+01 9.657e+01 1.038e+02 1.298e+02, threshold=1.931e+02, percent-clipped=0.0 2023-11-27 04:34:40,936 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3729473.3333333335, ans=0.0 2023-11-27 04:34:41,977 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3729473.3333333335, ans=0.0 2023-11-27 04:34:46,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3729473.3333333335, ans=0.0 2023-11-27 04:34:46,721 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.00 vs. limit=10.0 2023-11-27 04:34:48,382 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3729473.3333333335, ans=0.0 2023-11-27 04:34:51,658 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3729540.0, ans=0.0 2023-11-27 04:35:01,132 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3729606.6666666665, ans=0.125 2023-11-27 04:35:10,058 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 559450 2023-11-27 04:35:11,691 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.98 vs. limit=12.0 2023-11-27 04:35:13,205 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 6350, loss[loss=0.04567, simple_loss=0.05263, pruned_loss=0.008904, audio_tagging_loss=0.01045, over 14168.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.09037, pruned_loss=0.01208, audio_tagging_loss=0.008773, over 3058348.41 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:35:30,707 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3729740.0, ans=0.125 2023-11-27 04:35:31,931 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3729740.0, ans=0.0 2023-11-27 04:35:36,038 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3729806.6666666665, ans=0.125 2023-11-27 04:35:41,949 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3729806.6666666665, ans=0.1 2023-11-27 04:35:50,342 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3729873.3333333335, ans=0.2 2023-11-27 04:35:51,308 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3729873.3333333335, ans=0.125 2023-11-27 04:35:55,703 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3729873.3333333335, ans=0.0 2023-11-27 04:35:56,584 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 04:36:04,663 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3729940.0, ans=0.0 2023-11-27 04:36:05,617 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 559500 2023-11-27 04:36:08,719 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 6400, loss[loss=0.0622, simple_loss=0.0868, pruned_loss=0.009438, audio_tagging_loss=0.009364, over 14911.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08952, pruned_loss=0.01194, audio_tagging_loss=0.00894, over 3048937.97 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 04:36:26,643 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.785e+01 9.070e+01 9.519e+01 1.025e+02 1.551e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-27 04:36:56,580 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3730273.3333333335, ans=0.125 2023-11-27 04:37:00,628 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 559550 2023-11-27 04:37:03,652 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 6450, loss[loss=0.06272, simple_loss=0.08835, pruned_loss=0.01005, audio_tagging_loss=0.0085, over 16299.00 frames. ], tot_loss[loss=0.06589, simple_loss=0.0897, pruned_loss=0.01202, audio_tagging_loss=0.009021, over 3049667.68 frames. ], batch size: 61, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 04:37:15,071 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3730406.6666666665, ans=0.125 2023-11-27 04:37:20,368 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3730406.6666666665, ans=0.07 2023-11-27 04:37:27,121 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3730473.3333333335, ans=0.1 2023-11-27 04:37:31,398 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3730473.3333333335, ans=0.0 2023-11-27 04:37:45,182 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.67 vs. limit=15.0 2023-11-27 04:37:54,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3730606.6666666665, ans=0.0 2023-11-27 04:37:56,974 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 559600 2023-11-27 04:38:00,424 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 6500, loss[loss=0.06781, simple_loss=0.09193, pruned_loss=0.00945, audio_tagging_loss=0.01239, over 14959.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.08961, pruned_loss=0.01196, audio_tagging_loss=0.008989, over 3047916.23 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 04:38:00,580 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3730673.3333333335, ans=0.125 2023-11-27 04:38:09,909 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3730673.3333333335, ans=0.04949747468305833 2023-11-27 04:38:18,000 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.985e+01 8.947e+01 9.565e+01 1.041e+02 1.320e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-27 04:38:23,875 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.34 vs. limit=22.5 2023-11-27 04:38:25,625 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3730806.6666666665, ans=0.125 2023-11-27 04:38:42,121 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3730873.3333333335, ans=0.1 2023-11-27 04:38:53,171 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 559650 2023-11-27 04:38:56,214 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 6550, loss[loss=0.08824, simple_loss=0.1207, pruned_loss=0.02143, audio_tagging_loss=0.006462, over 15001.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08951, pruned_loss=0.01205, audio_tagging_loss=0.008883, over 3049453.41 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 04:39:01,644 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3731006.6666666665, ans=0.0 2023-11-27 04:39:11,115 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3731073.3333333335, ans=0.07 2023-11-27 04:39:43,201 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.42 vs. limit=22.5 2023-11-27 04:39:47,927 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 559700 2023-11-27 04:39:49,160 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 04:39:51,016 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 6600, loss[loss=0.05784, simple_loss=0.08307, pruned_loss=0.007927, audio_tagging_loss=0.008379, over 14840.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08961, pruned_loss=0.01201, audio_tagging_loss=0.008713, over 3047063.17 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 04:39:57,633 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3731340.0, ans=0.07 2023-11-27 04:40:02,816 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.96 vs. limit=6.0 2023-11-27 04:40:03,520 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3731406.6666666665, ans=0.125 2023-11-27 04:40:09,655 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.811e+01 8.917e+01 9.565e+01 1.016e+02 1.189e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-27 04:40:27,321 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3731540.0, ans=0.125 2023-11-27 04:40:29,338 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3731540.0, ans=0.1 2023-11-27 04:40:29,464 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3731540.0, ans=0.125 2023-11-27 04:40:38,384 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3731606.6666666665, ans=0.05 2023-11-27 04:40:44,491 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 559750 2023-11-27 04:40:47,626 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 6650, loss[loss=0.07297, simple_loss=0.1009, pruned_loss=0.01372, audio_tagging_loss=0.008811, over 15259.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08911, pruned_loss=0.01197, audio_tagging_loss=0.008721, over 3050246.77 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 04:40:51,116 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3731673.3333333335, ans=0.1 2023-11-27 04:40:55,250 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3731673.3333333335, ans=0.125 2023-11-27 04:41:09,551 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3731806.6666666665, ans=0.0 2023-11-27 04:41:17,989 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3731806.6666666665, ans=0.07 2023-11-27 04:41:32,481 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.60 vs. limit=15.0 2023-11-27 04:41:39,529 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 559800 2023-11-27 04:41:42,943 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 6700, loss[loss=0.06208, simple_loss=0.0857, pruned_loss=0.009903, audio_tagging_loss=0.009327, over 15235.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.08967, pruned_loss=0.01204, audio_tagging_loss=0.008644, over 3048025.82 frames. ], batch size: 60, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 04:41:56,465 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3732073.3333333335, ans=10.0 2023-11-27 04:42:03,040 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.591e+01 8.829e+01 9.518e+01 1.003e+02 1.219e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-27 04:42:15,954 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3732206.6666666665, ans=0.2 2023-11-27 04:42:20,545 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.34 vs. limit=12.0 2023-11-27 04:42:30,779 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3732273.3333333335, ans=0.125 2023-11-27 04:42:35,427 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 559850 2023-11-27 04:42:38,560 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 6750, loss[loss=0.07664, simple_loss=0.1013, pruned_loss=0.01594, audio_tagging_loss=0.01005, over 15681.00 frames. ], tot_loss[loss=0.06523, simple_loss=0.08925, pruned_loss=0.01199, audio_tagging_loss=0.008608, over 3038457.75 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 8.0 2023-11-27 04:42:47,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3732340.0, ans=0.2 2023-11-27 04:43:03,582 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3732473.3333333335, ans=0.125 2023-11-27 04:43:04,745 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3732473.3333333335, ans=0.1 2023-11-27 04:43:10,393 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.98 vs. limit=15.0 2023-11-27 04:43:25,627 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.81 vs. limit=22.5 2023-11-27 04:43:31,371 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 559900 2023-11-27 04:43:31,632 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3732606.6666666665, ans=0.95 2023-11-27 04:43:34,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3732673.3333333335, ans=0.0 2023-11-27 04:43:35,035 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 6800, loss[loss=0.07384, simple_loss=0.1077, pruned_loss=0.01385, audio_tagging_loss=0.006138, over 15770.00 frames. ], tot_loss[loss=0.06463, simple_loss=0.08852, pruned_loss=0.01185, audio_tagging_loss=0.008525, over 3038173.84 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 04:43:52,220 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3732740.0, ans=0.125 2023-11-27 04:43:54,088 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.648e+01 9.067e+01 9.528e+01 1.036e+02 1.458e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-27 04:43:57,498 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3732806.6666666665, ans=0.1 2023-11-27 04:43:57,857 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.59 vs. limit=22.5 2023-11-27 04:44:00,563 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.25 vs. limit=15.0 2023-11-27 04:44:20,665 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3732940.0, ans=0.125 2023-11-27 04:44:22,272 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.33 vs. limit=10.0 2023-11-27 04:44:25,978 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3732940.0, ans=0.0 2023-11-27 04:44:26,874 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 559950 2023-11-27 04:44:26,986 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3732940.0, ans=0.1 2023-11-27 04:44:27,071 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3732940.0, ans=0.0 2023-11-27 04:44:28,282 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.60 vs. limit=22.5 2023-11-27 04:44:29,482 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.78 vs. limit=15.0 2023-11-27 04:44:29,964 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 6850, loss[loss=0.07104, simple_loss=0.09946, pruned_loss=0.01131, audio_tagging_loss=0.009992, over 16072.00 frames. ], tot_loss[loss=0.06466, simple_loss=0.08854, pruned_loss=0.01189, audio_tagging_loss=0.0085, over 3035886.72 frames. ], batch size: 60, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 04:44:31,349 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3733006.6666666665, ans=0.0 2023-11-27 04:44:34,579 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3733006.6666666665, ans=10.0 2023-11-27 04:44:37,612 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 04:45:18,815 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3733273.3333333335, ans=0.1 2023-11-27 04:45:19,812 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3733273.3333333335, ans=0.0 2023-11-27 04:45:21,853 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 560000 2023-11-27 04:45:23,126 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-560000.pt 2023-11-27 04:45:27,244 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 6900, loss[loss=0.07238, simple_loss=0.09797, pruned_loss=0.01552, audio_tagging_loss=0.007866, over 14748.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08931, pruned_loss=0.01199, audio_tagging_loss=0.008543, over 3039591.95 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 04:45:33,539 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.17 vs. limit=15.0 2023-11-27 04:45:38,189 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3733406.6666666665, ans=0.1 2023-11-27 04:45:40,735 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3733406.6666666665, ans=0.125 2023-11-27 04:45:44,364 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3733406.6666666665, ans=0.2 2023-11-27 04:45:48,391 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.525e+01 8.976e+01 9.495e+01 1.031e+02 1.745e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-27 04:46:09,627 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 04:46:20,312 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 560050 2023-11-27 04:46:21,960 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3733606.6666666665, ans=0.0 2023-11-27 04:46:23,933 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 6950, loss[loss=0.06321, simple_loss=0.08833, pruned_loss=0.0119, audio_tagging_loss=0.007142, over 15052.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.09044, pruned_loss=0.01215, audio_tagging_loss=0.008509, over 3038608.62 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 04:46:42,973 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=11.49 vs. limit=12.0 2023-11-27 04:46:50,151 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3733806.6666666665, ans=0.2 2023-11-27 04:47:08,990 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3733940.0, ans=0.2 2023-11-27 04:47:16,349 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 560100 2023-11-27 04:47:19,469 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 7000, loss[loss=0.0488, simple_loss=0.07058, pruned_loss=0.0051, audio_tagging_loss=0.008409, over 14902.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08964, pruned_loss=0.01202, audio_tagging_loss=0.008519, over 3043166.21 frames. ], batch size: 54, lr: 1.43e-03, grad_scale: 8.0 2023-11-27 04:47:22,832 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3734006.6666666665, ans=0.0 2023-11-27 04:47:34,914 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.40 vs. limit=15.0 2023-11-27 04:47:39,256 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3734073.3333333335, ans=0.07 2023-11-27 04:47:39,973 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.504e+01 8.711e+01 9.428e+01 1.006e+02 1.288e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-27 04:47:57,535 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3734206.6666666665, ans=0.125 2023-11-27 04:48:09,133 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3734273.3333333335, ans=0.125 2023-11-27 04:48:11,209 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 560150 2023-11-27 04:48:14,241 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 7050, loss[loss=0.06047, simple_loss=0.08028, pruned_loss=0.01209, audio_tagging_loss=0.008234, over 14600.00 frames. ], tot_loss[loss=0.06509, simple_loss=0.08928, pruned_loss=0.01185, audio_tagging_loss=0.008606, over 3035957.41 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 4.0 2023-11-27 04:48:17,021 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.52 vs. limit=15.0 2023-11-27 04:48:33,616 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3734406.6666666665, ans=0.125 2023-11-27 04:48:58,705 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3734606.6666666665, ans=0.1 2023-11-27 04:49:06,579 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 560200 2023-11-27 04:49:10,408 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 7100, loss[loss=0.06495, simple_loss=0.07757, pruned_loss=0.01073, audio_tagging_loss=0.01543, over 14326.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.08973, pruned_loss=0.01188, audio_tagging_loss=0.008732, over 3045721.05 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 8.0 2023-11-27 04:49:18,570 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3734673.3333333335, ans=0.1 2023-11-27 04:49:23,688 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3734740.0, ans=0.125 2023-11-27 04:49:24,877 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3734740.0, ans=0.09899494936611666 2023-11-27 04:49:27,952 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3734740.0, ans=0.125 2023-11-27 04:49:32,044 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.100e+01 9.003e+01 9.804e+01 1.066e+02 3.214e+02, threshold=1.961e+02, percent-clipped=1.0 2023-11-27 04:49:38,933 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten.whitening_limit, batch_count=3734806.6666666665, ans=15.0 2023-11-27 04:50:02,375 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 560250 2023-11-27 04:50:05,481 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 7150, loss[loss=0.06518, simple_loss=0.09107, pruned_loss=0.009981, audio_tagging_loss=0.009668, over 14477.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.08962, pruned_loss=0.01199, audio_tagging_loss=0.008797, over 3043373.76 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 8.0 2023-11-27 04:50:20,622 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3735073.3333333335, ans=0.125 2023-11-27 04:50:31,831 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 04:50:43,904 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3735206.6666666665, ans=0.1 2023-11-27 04:50:49,116 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3735273.3333333335, ans=0.125 2023-11-27 04:50:57,367 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 560300 2023-11-27 04:51:00,452 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 7200, loss[loss=0.07599, simple_loss=0.1118, pruned_loss=0.01223, audio_tagging_loss=0.007873, over 15250.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.08972, pruned_loss=0.01206, audio_tagging_loss=0.00895, over 3043701.06 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 04:51:23,661 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.638e+01 8.928e+01 9.518e+01 1.020e+02 1.389e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-27 04:51:23,888 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3735473.3333333335, ans=0.0 2023-11-27 04:51:26,059 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3735473.3333333335, ans=0.125 2023-11-27 04:51:40,805 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3735540.0, ans=0.125 2023-11-27 04:51:47,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3735606.6666666665, ans=0.125 2023-11-27 04:51:47,462 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.22 vs. limit=15.0 2023-11-27 04:51:52,255 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 560350 2023-11-27 04:51:54,528 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3735673.3333333335, ans=0.0 2023-11-27 04:51:55,940 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 7250, loss[loss=0.06621, simple_loss=0.09296, pruned_loss=0.01287, audio_tagging_loss=0.006864, over 16085.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.08992, pruned_loss=0.01211, audio_tagging_loss=0.008988, over 3047533.07 frames. ], batch size: 61, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 04:51:57,484 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.24 vs. limit=15.0 2023-11-27 04:52:06,223 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3735673.3333333335, ans=0.07 2023-11-27 04:52:11,561 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3735740.0, ans=0.125 2023-11-27 04:52:38,480 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3735873.3333333335, ans=0.125 2023-11-27 04:52:39,439 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3735940.0, ans=0.125 2023-11-27 04:52:43,851 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3735940.0, ans=0.2 2023-11-27 04:52:43,857 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3735940.0, ans=0.0 2023-11-27 04:52:48,920 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 560400 2023-11-27 04:52:49,048 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3735940.0, ans=0.2 2023-11-27 04:52:51,482 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3736006.6666666665, ans=0.07 2023-11-27 04:52:52,308 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 7300, loss[loss=0.06709, simple_loss=0.0932, pruned_loss=0.01266, audio_tagging_loss=0.007835, over 15273.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.08975, pruned_loss=0.01202, audio_tagging_loss=0.008846, over 3054976.01 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 04:52:58,258 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.49 vs. limit=6.0 2023-11-27 04:53:01,075 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3736006.6666666665, ans=0.0 2023-11-27 04:53:04,126 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3736073.3333333335, ans=0.0 2023-11-27 04:53:11,507 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3736073.3333333335, ans=0.2 2023-11-27 04:53:13,093 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.13 vs. limit=22.5 2023-11-27 04:53:13,373 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.632e+01 8.819e+01 9.611e+01 1.034e+02 1.337e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-27 04:53:14,800 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3736140.0, ans=0.125 2023-11-27 04:53:32,898 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.71 vs. limit=22.5 2023-11-27 04:53:43,240 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3736273.3333333335, ans=0.0 2023-11-27 04:53:44,156 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 560450 2023-11-27 04:53:44,284 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3736273.3333333335, ans=0.125 2023-11-27 04:53:47,242 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 7350, loss[loss=0.06337, simple_loss=0.08441, pruned_loss=0.01299, audio_tagging_loss=0.008176, over 15393.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.09038, pruned_loss=0.01216, audio_tagging_loss=0.008709, over 3061339.31 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 04:53:48,532 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3736340.0, ans=0.125 2023-11-27 04:54:13,885 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3736473.3333333335, ans=0.2 2023-11-27 04:54:17,937 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3736473.3333333335, ans=0.2 2023-11-27 04:54:32,800 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3736606.6666666665, ans=0.125 2023-11-27 04:54:37,059 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3736606.6666666665, ans=0.0 2023-11-27 04:54:38,289 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.70 vs. limit=15.0 2023-11-27 04:54:38,963 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 560500 2023-11-27 04:54:42,037 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 7400, loss[loss=0.07618, simple_loss=0.1099, pruned_loss=0.01588, audio_tagging_loss=0.005365, over 15990.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08951, pruned_loss=0.01215, audio_tagging_loss=0.008713, over 3060996.67 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 04:54:48,506 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3736673.3333333335, ans=0.0 2023-11-27 04:54:55,092 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3736740.0, ans=0.0 2023-11-27 04:55:04,910 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.57 vs. limit=15.0 2023-11-27 04:55:05,349 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.674e+01 8.988e+01 9.437e+01 1.029e+02 2.461e+02, threshold=1.887e+02, percent-clipped=1.0 2023-11-27 04:55:06,709 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3736806.6666666665, ans=0.125 2023-11-27 04:55:14,021 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=3736806.6666666665, ans=0.2 2023-11-27 04:55:14,103 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3736806.6666666665, ans=0.125 2023-11-27 04:55:21,573 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3736873.3333333335, ans=0.125 2023-11-27 04:55:26,262 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.32 vs. limit=15.0 2023-11-27 04:55:35,699 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 560550 2023-11-27 04:55:39,274 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 7450, loss[loss=0.05163, simple_loss=0.06391, pruned_loss=0.009996, audio_tagging_loss=0.009677, over 15800.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.09024, pruned_loss=0.01225, audio_tagging_loss=0.008555, over 3059134.60 frames. ], batch size: 64, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 04:55:39,523 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3737006.6666666665, ans=0.1 2023-11-27 04:55:44,874 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3737006.6666666665, ans=0.0 2023-11-27 04:55:51,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3737073.3333333335, ans=0.125 2023-11-27 04:56:01,949 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.47 vs. limit=12.0 2023-11-27 04:56:20,320 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3737206.6666666665, ans=0.125 2023-11-27 04:56:27,221 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3737273.3333333335, ans=0.0 2023-11-27 04:56:31,237 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 560600 2023-11-27 04:56:34,661 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 7500, loss[loss=0.07455, simple_loss=0.1119, pruned_loss=0.01198, audio_tagging_loss=0.006619, over 16034.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.09066, pruned_loss=0.01233, audio_tagging_loss=0.008523, over 3061895.45 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 04:56:47,955 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.76 vs. limit=15.0 2023-11-27 04:56:56,833 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.600e+01 8.930e+01 9.677e+01 1.022e+02 1.348e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-27 04:57:18,271 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-27 04:57:18,289 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3737606.6666666665, ans=0.025 2023-11-27 04:57:20,366 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3737606.6666666665, ans=0.1 2023-11-27 04:57:23,800 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.04 vs. limit=6.0 2023-11-27 04:57:25,947 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.26 vs. limit=15.0 2023-11-27 04:57:26,560 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 560650 2023-11-27 04:57:29,691 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 7550, loss[loss=0.05788, simple_loss=0.07544, pruned_loss=0.008911, audio_tagging_loss=0.01125, over 14229.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.09017, pruned_loss=0.01227, audio_tagging_loss=0.008489, over 3063605.52 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 04:57:32,096 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3737673.3333333335, ans=0.1 2023-11-27 04:58:10,402 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.10 vs. limit=15.0 2023-11-27 04:58:17,348 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.38 vs. limit=15.0 2023-11-27 04:58:23,184 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 560700 2023-11-27 04:58:24,788 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.73 vs. limit=15.0 2023-11-27 04:58:25,546 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3738006.6666666665, ans=0.0 2023-11-27 04:58:26,292 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 7600, loss[loss=0.05546, simple_loss=0.07239, pruned_loss=0.0117, audio_tagging_loss=0.007572, over 13949.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08914, pruned_loss=0.01216, audio_tagging_loss=0.008507, over 3046712.56 frames. ], batch size: 53, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 04:58:27,521 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3738006.6666666665, ans=0.125 2023-11-27 04:58:30,783 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3738006.6666666665, ans=0.2 2023-11-27 04:58:32,851 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.28 vs. limit=10.0 2023-11-27 04:58:33,494 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3738006.6666666665, ans=0.2 2023-11-27 04:58:48,035 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.645e+01 8.653e+01 9.351e+01 1.003e+02 1.286e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-27 04:58:51,444 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3738140.0, ans=0.09899494936611666 2023-11-27 04:59:00,338 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 04:59:17,836 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3738273.3333333335, ans=0.0 2023-11-27 04:59:18,821 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 560750 2023-11-27 04:59:22,059 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 7650, loss[loss=0.05438, simple_loss=0.06691, pruned_loss=0.009257, audio_tagging_loss=0.01167, over 13437.00 frames. ], tot_loss[loss=0.06488, simple_loss=0.0888, pruned_loss=0.01205, audio_tagging_loss=0.008435, over 3038312.25 frames. ], batch size: 53, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 04:59:29,626 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3738340.0, ans=0.0 2023-11-27 04:59:50,206 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3738473.3333333335, ans=0.125 2023-11-27 05:00:00,289 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3738540.0, ans=0.1 2023-11-27 05:00:06,459 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3738606.6666666665, ans=0.0 2023-11-27 05:00:10,024 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.10 vs. limit=15.0 2023-11-27 05:00:13,174 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.46 vs. limit=15.0 2023-11-27 05:00:13,734 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 560800 2023-11-27 05:00:17,086 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 7700, loss[loss=0.07623, simple_loss=0.09635, pruned_loss=0.01472, audio_tagging_loss=0.01333, over 14825.00 frames. ], tot_loss[loss=0.06467, simple_loss=0.08854, pruned_loss=0.0119, audio_tagging_loss=0.008501, over 3039590.69 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:00:25,357 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3738673.3333333335, ans=0.0 2023-11-27 05:00:31,356 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.27 vs. limit=22.5 2023-11-27 05:00:32,359 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.33 vs. limit=12.0 2023-11-27 05:00:36,130 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3738740.0, ans=0.2 2023-11-27 05:00:40,060 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.670e+01 9.124e+01 9.752e+01 1.036e+02 1.277e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-27 05:00:46,057 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3738806.6666666665, ans=0.07 2023-11-27 05:01:09,983 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 560850 2023-11-27 05:01:10,081 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3738940.0, ans=0.0 2023-11-27 05:01:13,598 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 7750, loss[loss=0.0626, simple_loss=0.09065, pruned_loss=0.009701, audio_tagging_loss=0.00757, over 15284.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08934, pruned_loss=0.012, audio_tagging_loss=0.008456, over 3043634.44 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:01:34,991 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.82 vs. limit=15.0 2023-11-27 05:01:39,648 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3739140.0, ans=0.0 2023-11-27 05:01:56,112 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3739206.6666666665, ans=0.2 2023-11-27 05:01:58,156 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3739273.3333333335, ans=0.125 2023-11-27 05:02:05,325 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 560900 2023-11-27 05:02:05,935 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.13 vs. limit=15.0 2023-11-27 05:02:08,478 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 7800, loss[loss=0.07075, simple_loss=0.09985, pruned_loss=0.01555, audio_tagging_loss=0.005283, over 15024.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.0901, pruned_loss=0.0123, audio_tagging_loss=0.008503, over 3040307.35 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:02:31,129 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.744e+01 8.958e+01 9.629e+01 1.040e+02 1.238e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-27 05:02:50,488 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3739540.0, ans=0.2 2023-11-27 05:02:58,526 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3739606.6666666665, ans=0.125 2023-11-27 05:03:00,423 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 560950 2023-11-27 05:03:03,540 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 7850, loss[loss=0.05733, simple_loss=0.07349, pruned_loss=0.01006, audio_tagging_loss=0.01053, over 14833.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.08975, pruned_loss=0.01233, audio_tagging_loss=0.00866, over 3045008.90 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:03:28,714 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3739806.6666666665, ans=0.125 2023-11-27 05:03:32,789 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3739806.6666666665, ans=0.1 2023-11-27 05:03:56,064 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 561000 2023-11-27 05:03:59,948 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 7900, loss[loss=0.058, simple_loss=0.07726, pruned_loss=0.01053, audio_tagging_loss=0.008838, over 16529.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08892, pruned_loss=0.01212, audio_tagging_loss=0.008791, over 3050616.13 frames. ], batch size: 63, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:04:10,326 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3740073.3333333335, ans=0.0 2023-11-27 05:04:13,330 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3740073.3333333335, ans=0.07 2023-11-27 05:04:23,049 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.984e+01 9.150e+01 9.892e+01 1.047e+02 1.288e+02, threshold=1.978e+02, percent-clipped=0.0 2023-11-27 05:04:52,363 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 561050 2023-11-27 05:04:55,421 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 7950, loss[loss=0.06321, simple_loss=0.09079, pruned_loss=0.00788, audio_tagging_loss=0.009937, over 14652.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.08913, pruned_loss=0.01209, audio_tagging_loss=0.008892, over 3047004.57 frames. ], batch size: 53, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:04:59,932 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3740340.0, ans=0.0 2023-11-27 05:05:08,756 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 05:05:24,658 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3740473.3333333335, ans=0.125 2023-11-27 05:05:25,835 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3740473.3333333335, ans=0.125 2023-11-27 05:05:32,600 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3740540.0, ans=0.2 2023-11-27 05:05:36,964 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3740540.0, ans=0.0 2023-11-27 05:05:43,251 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3740606.6666666665, ans=0.125 2023-11-27 05:05:46,470 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3740606.6666666665, ans=0.125 2023-11-27 05:05:47,388 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 561100 2023-11-27 05:05:51,027 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 8000, loss[loss=0.07738, simple_loss=0.1121, pruned_loss=0.01403, audio_tagging_loss=0.007288, over 14949.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08903, pruned_loss=0.01204, audio_tagging_loss=0.008942, over 3043943.27 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:05:58,403 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3740673.3333333335, ans=0.125 2023-11-27 05:06:05,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3740740.0, ans=0.0 2023-11-27 05:06:14,624 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.003e+01 8.945e+01 9.427e+01 1.017e+02 1.273e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-27 05:06:36,531 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3740940.0, ans=0.125 2023-11-27 05:06:42,832 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 561150 2023-11-27 05:06:46,431 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 8050, loss[loss=0.07148, simple_loss=0.1026, pruned_loss=0.01186, audio_tagging_loss=0.008339, over 14713.00 frames. ], tot_loss[loss=0.06484, simple_loss=0.0881, pruned_loss=0.01176, audio_tagging_loss=0.009032, over 3047136.00 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:06:59,399 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3741073.3333333335, ans=0.125 2023-11-27 05:07:25,282 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3741206.6666666665, ans=0.125 2023-11-27 05:07:36,587 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.70 vs. limit=15.0 2023-11-27 05:07:39,330 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 561200 2023-11-27 05:07:42,683 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 8100, loss[loss=0.08554, simple_loss=0.117, pruned_loss=0.01934, audio_tagging_loss=0.007724, over 15510.00 frames. ], tot_loss[loss=0.06445, simple_loss=0.08761, pruned_loss=0.0117, audio_tagging_loss=0.008947, over 3054966.00 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:07:43,894 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 05:07:47,542 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.52 vs. limit=15.0 2023-11-27 05:08:04,351 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3741473.3333333335, ans=0.125 2023-11-27 05:08:05,307 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.227e+01 8.986e+01 9.924e+01 1.066e+02 1.404e+02, threshold=1.985e+02, percent-clipped=0.0 2023-11-27 05:08:24,876 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3741540.0, ans=0.125 2023-11-27 05:08:26,976 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3741606.6666666665, ans=0.1 2023-11-27 05:08:34,238 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 561250 2023-11-27 05:08:37,356 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 8150, loss[loss=0.08643, simple_loss=0.123, pruned_loss=0.01932, audio_tagging_loss=0.005602, over 16001.00 frames. ], tot_loss[loss=0.06446, simple_loss=0.0879, pruned_loss=0.01174, audio_tagging_loss=0.008765, over 3050396.92 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:08:44,398 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3741673.3333333335, ans=0.125 2023-11-27 05:08:44,509 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3741673.3333333335, ans=0.125 2023-11-27 05:08:45,489 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3741673.3333333335, ans=0.2 2023-11-27 05:08:50,159 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.55 vs. limit=15.0 2023-11-27 05:08:51,856 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3741740.0, ans=0.5 2023-11-27 05:09:26,740 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3741940.0, ans=0.125 2023-11-27 05:09:29,740 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 561300 2023-11-27 05:09:32,792 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 8200, loss[loss=0.05372, simple_loss=0.06752, pruned_loss=0.007101, audio_tagging_loss=0.01285, over 14276.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.08943, pruned_loss=0.01186, audio_tagging_loss=0.008577, over 3060812.23 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:09:32,839 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 05:09:35,581 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3742006.6666666665, ans=0.125 2023-11-27 05:09:44,593 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.01 vs. limit=15.0 2023-11-27 05:09:52,550 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3742073.3333333335, ans=0.1 2023-11-27 05:09:57,529 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.908e+01 9.029e+01 9.641e+01 1.058e+02 1.267e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-27 05:10:00,947 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3742140.0, ans=0.07 2023-11-27 05:10:04,450 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.31 vs. limit=6.0 2023-11-27 05:10:26,195 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 561350 2023-11-27 05:10:29,346 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 8250, loss[loss=0.05949, simple_loss=0.08333, pruned_loss=0.009898, audio_tagging_loss=0.00793, over 15168.00 frames. ], tot_loss[loss=0.06528, simple_loss=0.08963, pruned_loss=0.01193, audio_tagging_loss=0.008537, over 3059981.80 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:11:13,878 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3742606.6666666665, ans=0.1 2023-11-27 05:11:21,048 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 561400 2023-11-27 05:11:24,418 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 8300, loss[loss=0.06205, simple_loss=0.09003, pruned_loss=0.01002, audio_tagging_loss=0.007015, over 15123.00 frames. ], tot_loss[loss=0.06487, simple_loss=0.08889, pruned_loss=0.01186, audio_tagging_loss=0.008565, over 3057405.39 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:11:27,947 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3742673.3333333335, ans=0.0 2023-11-27 05:11:36,318 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3742740.0, ans=0.0 2023-11-27 05:11:43,629 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3742740.0, ans=0.0 2023-11-27 05:11:49,846 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.852e+01 8.979e+01 9.695e+01 1.038e+02 1.326e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-27 05:11:54,347 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3742806.6666666665, ans=0.2 2023-11-27 05:12:09,151 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3742940.0, ans=0.2 2023-11-27 05:12:16,325 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 561450 2023-11-27 05:12:19,006 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.78 vs. limit=15.0 2023-11-27 05:12:19,466 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 8350, loss[loss=0.07825, simple_loss=0.1047, pruned_loss=0.01779, audio_tagging_loss=0.008125, over 15882.00 frames. ], tot_loss[loss=0.06528, simple_loss=0.08971, pruned_loss=0.01193, audio_tagging_loss=0.008493, over 3054560.34 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:12:20,680 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3743006.6666666665, ans=0.125 2023-11-27 05:12:23,377 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3743006.6666666665, ans=0.0 2023-11-27 05:12:31,431 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3743073.3333333335, ans=0.0 2023-11-27 05:12:43,194 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3743140.0, ans=0.0 2023-11-27 05:12:48,301 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3743140.0, ans=0.125 2023-11-27 05:13:13,222 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 561500 2023-11-27 05:13:13,929 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.65 vs. limit=15.0 2023-11-27 05:13:16,288 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 8400, loss[loss=0.06454, simple_loss=0.08676, pruned_loss=0.01242, audio_tagging_loss=0.008743, over 15099.00 frames. ], tot_loss[loss=0.06493, simple_loss=0.08958, pruned_loss=0.01172, audio_tagging_loss=0.008424, over 3056090.05 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:13:21,113 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.33 vs. limit=10.0 2023-11-27 05:13:21,923 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3743340.0, ans=0.0 2023-11-27 05:13:26,455 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.81 vs. limit=15.0 2023-11-27 05:13:39,604 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.675e+01 8.778e+01 9.390e+01 9.920e+01 1.165e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-27 05:14:02,215 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3743606.6666666665, ans=0.125 2023-11-27 05:14:08,323 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 561550 2023-11-27 05:14:11,419 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 8450, loss[loss=0.05818, simple_loss=0.08307, pruned_loss=0.007982, audio_tagging_loss=0.00867, over 14637.00 frames. ], tot_loss[loss=0.065, simple_loss=0.08977, pruned_loss=0.01172, audio_tagging_loss=0.008401, over 3052304.33 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:14:42,014 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3743806.6666666665, ans=0.95 2023-11-27 05:15:03,048 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 561600 2023-11-27 05:15:06,445 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 8500, loss[loss=0.06809, simple_loss=0.09327, pruned_loss=0.01274, audio_tagging_loss=0.008714, over 15544.00 frames. ], tot_loss[loss=0.06522, simple_loss=0.08993, pruned_loss=0.01181, audio_tagging_loss=0.008452, over 3064751.99 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:15:17,399 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3744073.3333333335, ans=0.125 2023-11-27 05:15:31,432 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.032e+01 9.244e+01 9.812e+01 1.039e+02 1.357e+02, threshold=1.962e+02, percent-clipped=0.0 2023-11-27 05:15:52,760 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3744273.3333333335, ans=0.125 2023-11-27 05:15:58,940 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 561650 2023-11-27 05:16:03,177 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 8550, loss[loss=0.065, simple_loss=0.09131, pruned_loss=0.01234, audio_tagging_loss=0.00701, over 14489.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08984, pruned_loss=0.01186, audio_tagging_loss=0.008531, over 3064573.72 frames. ], batch size: 54, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:16:44,164 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3744540.0, ans=0.125 2023-11-27 05:16:54,649 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 561700 2023-11-27 05:16:57,855 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 8600, loss[loss=0.06236, simple_loss=0.08091, pruned_loss=0.01076, audio_tagging_loss=0.01115, over 15102.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.09007, pruned_loss=0.012, audio_tagging_loss=0.008579, over 3063319.61 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:17:01,122 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3744673.3333333335, ans=0.0 2023-11-27 05:17:07,537 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3744740.0, ans=0.0 2023-11-27 05:17:22,245 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.611e+01 8.987e+01 9.604e+01 1.025e+02 1.300e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-27 05:17:47,153 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3744940.0, ans=0.07 2023-11-27 05:17:47,226 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3744940.0, ans=0.1 2023-11-27 05:17:50,183 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 561750 2023-11-27 05:17:53,197 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 8650, loss[loss=0.05508, simple_loss=0.06946, pruned_loss=0.01025, audio_tagging_loss=0.0101, over 13326.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.0898, pruned_loss=0.01207, audio_tagging_loss=0.00863, over 3058962.82 frames. ], batch size: 52, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:18:02,484 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3745006.6666666665, ans=0.125 2023-11-27 05:18:09,786 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3745073.3333333335, ans=0.0 2023-11-27 05:18:14,643 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3745073.3333333335, ans=0.125 2023-11-27 05:18:33,578 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3745206.6666666665, ans=0.125 2023-11-27 05:18:33,854 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.25 vs. limit=22.5 2023-11-27 05:18:35,590 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3745206.6666666665, ans=0.1 2023-11-27 05:18:40,877 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 05:18:45,592 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 561800 2023-11-27 05:18:45,819 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3745273.3333333335, ans=0.125 2023-11-27 05:18:47,632 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3745273.3333333335, ans=0.0 2023-11-27 05:18:50,062 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 8700, loss[loss=0.07087, simple_loss=0.09034, pruned_loss=0.01524, audio_tagging_loss=0.01046, over 15478.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.0901, pruned_loss=0.01217, audio_tagging_loss=0.008645, over 3054974.18 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:19:01,391 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3745406.6666666665, ans=0.0 2023-11-27 05:19:07,167 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.21 vs. limit=12.0 2023-11-27 05:19:14,998 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.677e+01 9.080e+01 9.687e+01 1.041e+02 1.884e+02, threshold=1.937e+02, percent-clipped=0.0 2023-11-27 05:19:26,494 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3745540.0, ans=0.125 2023-11-27 05:19:26,496 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3745540.0, ans=0.0 2023-11-27 05:19:35,903 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3745606.6666666665, ans=0.0 2023-11-27 05:19:42,773 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 561850 2023-11-27 05:19:45,866 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 8750, loss[loss=0.07039, simple_loss=0.08892, pruned_loss=0.01515, audio_tagging_loss=0.01078, over 15940.00 frames. ], tot_loss[loss=0.06649, simple_loss=0.09091, pruned_loss=0.01239, audio_tagging_loss=0.008645, over 3054764.70 frames. ], batch size: 60, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:19:59,832 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3745740.0, ans=0.125 2023-11-27 05:20:14,492 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3745806.6666666665, ans=0.125 2023-11-27 05:20:26,229 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3745873.3333333335, ans=0.0 2023-11-27 05:20:37,693 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 561900 2023-11-27 05:20:40,951 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 8800, loss[loss=0.06808, simple_loss=0.09593, pruned_loss=0.01182, audio_tagging_loss=0.008293, over 15920.00 frames. ], tot_loss[loss=0.06688, simple_loss=0.09144, pruned_loss=0.01243, audio_tagging_loss=0.008731, over 3049462.17 frames. ], batch size: 60, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:21:07,023 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.717e+01 9.208e+01 9.853e+01 1.073e+02 1.310e+02, threshold=1.971e+02, percent-clipped=0.0 2023-11-27 05:21:27,815 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3746273.3333333335, ans=0.07 2023-11-27 05:21:32,945 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 561950 2023-11-27 05:21:37,188 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 8850, loss[loss=0.06107, simple_loss=0.0778, pruned_loss=0.01223, audio_tagging_loss=0.009949, over 14921.00 frames. ], tot_loss[loss=0.0666, simple_loss=0.09089, pruned_loss=0.01237, audio_tagging_loss=0.00879, over 3048603.57 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:21:45,549 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.61 vs. limit=15.0 2023-11-27 05:21:46,332 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3746340.0, ans=0.125 2023-11-27 05:21:47,256 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 05:21:50,560 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3746406.6666666665, ans=0.125 2023-11-27 05:21:55,335 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3746406.6666666665, ans=0.0 2023-11-27 05:21:59,524 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3746473.3333333335, ans=0.125 2023-11-27 05:22:11,374 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3746540.0, ans=0.125 2023-11-27 05:22:28,845 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3746606.6666666665, ans=0.125 2023-11-27 05:22:29,709 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 562000 2023-11-27 05:22:32,320 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3746673.3333333335, ans=0.0 2023-11-27 05:22:33,128 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 8900, loss[loss=0.06767, simple_loss=0.09068, pruned_loss=0.01445, audio_tagging_loss=0.007883, over 14906.00 frames. ], tot_loss[loss=0.06713, simple_loss=0.09168, pruned_loss=0.01256, audio_tagging_loss=0.008737, over 3049808.28 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:22:35,413 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3746673.3333333335, ans=0.0 2023-11-27 05:22:38,133 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3746673.3333333335, ans=0.05 2023-11-27 05:22:42,514 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3746673.3333333335, ans=0.0 2023-11-27 05:22:49,784 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3746740.0, ans=0.1 2023-11-27 05:22:50,837 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3746740.0, ans=0.2 2023-11-27 05:22:59,602 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.698e+01 9.144e+01 9.662e+01 1.039e+02 1.595e+02, threshold=1.932e+02, percent-clipped=0.0 2023-11-27 05:23:06,275 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3746873.3333333335, ans=0.0 2023-11-27 05:23:25,558 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 562050 2023-11-27 05:23:28,620 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 8950, loss[loss=0.09009, simple_loss=0.1231, pruned_loss=0.02027, audio_tagging_loss=0.008251, over 15728.00 frames. ], tot_loss[loss=0.06696, simple_loss=0.09169, pruned_loss=0.01254, audio_tagging_loss=0.008585, over 3046449.30 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:23:48,397 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3747073.3333333335, ans=0.125 2023-11-27 05:23:53,544 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3747140.0, ans=0.1 2023-11-27 05:23:54,599 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3747140.0, ans=0.0 2023-11-27 05:23:57,756 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3747140.0, ans=0.0 2023-11-27 05:23:59,880 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 05:24:06,957 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3747206.6666666665, ans=0.2 2023-11-27 05:24:15,465 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3747273.3333333335, ans=0.0 2023-11-27 05:24:17,008 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.51 vs. limit=15.0 2023-11-27 05:24:20,677 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 562100 2023-11-27 05:24:24,290 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 9000, loss[loss=0.05773, simple_loss=0.07697, pruned_loss=0.01019, audio_tagging_loss=0.009055, over 14507.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.09063, pruned_loss=0.01239, audio_tagging_loss=0.008554, over 3048938.13 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:24:24,292 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-27 05:24:36,501 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([6.4902, 6.1559, 6.4379, 5.9544], device='cuda:0') 2023-11-27 05:24:56,577 INFO [train_asr.py:1267] (0/4) Epoch 47, validation: loss=0.05848, simple_loss=0.05048, pruned_loss=0.005329, audio_tagging_loss=0.02791, over 4681554.00 frames. 2023-11-27 05:24:56,578 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-27 05:25:08,978 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3747406.6666666665, ans=0.2 2023-11-27 05:25:23,630 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.971e+01 9.071e+01 9.540e+01 1.018e+02 1.204e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-27 05:25:45,561 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.18 vs. limit=15.0 2023-11-27 05:25:49,083 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 562150 2023-11-27 05:25:52,134 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 9050, loss[loss=0.06881, simple_loss=0.09763, pruned_loss=0.0124, audio_tagging_loss=0.007591, over 15285.00 frames. ], tot_loss[loss=0.06704, simple_loss=0.09203, pruned_loss=0.0126, audio_tagging_loss=0.008421, over 3049489.95 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:25:53,452 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 05:26:15,226 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3747806.6666666665, ans=0.0 2023-11-27 05:26:44,561 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 562200 2023-11-27 05:26:48,206 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 9100, loss[loss=0.05951, simple_loss=0.0833, pruned_loss=0.01035, audio_tagging_loss=0.007517, over 16501.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.09075, pruned_loss=0.01235, audio_tagging_loss=0.008442, over 3050204.90 frames. ], batch size: 61, lr: 1.43e-03, grad_scale: 8.0 2023-11-27 05:27:15,544 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.643e+01 9.080e+01 9.612e+01 1.021e+02 1.425e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-27 05:27:17,210 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.00 vs. limit=15.0 2023-11-27 05:27:26,066 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.09 vs. limit=15.0 2023-11-27 05:27:29,571 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3748206.6666666665, ans=0.2 2023-11-27 05:27:36,392 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 05:27:38,500 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3748273.3333333335, ans=0.0 2023-11-27 05:27:40,444 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 562250 2023-11-27 05:27:42,775 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3748340.0, ans=0.0 2023-11-27 05:27:43,542 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 9150, loss[loss=0.07968, simple_loss=0.1041, pruned_loss=0.01595, audio_tagging_loss=0.0117, over 15118.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.09051, pruned_loss=0.01225, audio_tagging_loss=0.008489, over 3049681.02 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 8.0 2023-11-27 05:27:45,810 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3748340.0, ans=0.0 2023-11-27 05:27:49,245 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.49 vs. limit=15.0 2023-11-27 05:27:55,201 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.03 vs. limit=10.0 2023-11-27 05:28:02,693 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.07 vs. limit=15.0 2023-11-27 05:28:11,736 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3748473.3333333335, ans=0.125 2023-11-27 05:28:12,758 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3748473.3333333335, ans=0.125 2023-11-27 05:28:31,408 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3748606.6666666665, ans=0.125 2023-11-27 05:28:34,643 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3748606.6666666665, ans=0.0 2023-11-27 05:28:35,592 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 562300 2023-11-27 05:28:39,267 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 9200, loss[loss=0.0697, simple_loss=0.1037, pruned_loss=0.009668, audio_tagging_loss=0.0082, over 16183.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.09099, pruned_loss=0.0123, audio_tagging_loss=0.008457, over 3048616.30 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:29:01,412 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.89 vs. limit=22.5 2023-11-27 05:29:07,164 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.322e+01 9.097e+01 9.873e+01 1.057e+02 1.295e+02, threshold=1.975e+02, percent-clipped=0.0 2023-11-27 05:29:31,601 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 562350 2023-11-27 05:29:35,192 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 9250, loss[loss=0.06836, simple_loss=0.09911, pruned_loss=0.01116, audio_tagging_loss=0.007652, over 15243.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08996, pruned_loss=0.01191, audio_tagging_loss=0.008447, over 3053648.32 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:30:27,474 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 562400 2023-11-27 05:30:30,780 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 9300, loss[loss=0.07874, simple_loss=0.1045, pruned_loss=0.01748, audio_tagging_loss=0.009011, over 14789.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08972, pruned_loss=0.01196, audio_tagging_loss=0.008526, over 3053471.27 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:30:42,549 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3749406.6666666665, ans=0.035 2023-11-27 05:30:58,394 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.309e+01 9.060e+01 9.666e+01 1.045e+02 1.304e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-27 05:31:07,691 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.91 vs. limit=15.0 2023-11-27 05:31:17,727 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3749606.6666666665, ans=0.0 2023-11-27 05:31:22,687 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 562450 2023-11-27 05:31:25,815 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 9350, loss[loss=0.05082, simple_loss=0.07116, pruned_loss=0.007389, audio_tagging_loss=0.007853, over 16708.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.0898, pruned_loss=0.01205, audio_tagging_loss=0.008551, over 3054112.48 frames. ], batch size: 62, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:31:35,373 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.89 vs. limit=22.5 2023-11-27 05:31:37,201 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3749740.0, ans=10.0 2023-11-27 05:31:47,535 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.30 vs. limit=15.0 2023-11-27 05:31:51,521 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3749806.6666666665, ans=0.125 2023-11-27 05:32:18,290 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 562500 2023-11-27 05:32:21,973 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 9400, loss[loss=0.06886, simple_loss=0.09528, pruned_loss=0.0143, audio_tagging_loss=0.006921, over 15362.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08931, pruned_loss=0.01209, audio_tagging_loss=0.008612, over 3050174.95 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 8.0 2023-11-27 05:32:23,789 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.15 vs. limit=12.0 2023-11-27 05:32:50,574 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.875e+01 9.326e+01 9.875e+01 1.073e+02 1.247e+02, threshold=1.975e+02, percent-clipped=0.0 2023-11-27 05:33:04,832 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.57 vs. limit=10.0 2023-11-27 05:33:06,398 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3750273.3333333335, ans=0.125 2023-11-27 05:33:12,829 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3750273.3333333335, ans=0.0 2023-11-27 05:33:14,823 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 562550 2023-11-27 05:33:14,994 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3750273.3333333335, ans=0.125 2023-11-27 05:33:16,810 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 05:33:17,834 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 9450, loss[loss=0.07614, simple_loss=0.1004, pruned_loss=0.01926, audio_tagging_loss=0.00668, over 14342.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08902, pruned_loss=0.01205, audio_tagging_loss=0.008735, over 3051022.49 frames. ], batch size: 53, lr: 1.43e-03, grad_scale: 8.0 2023-11-27 05:33:43,111 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3750473.3333333335, ans=0.0 2023-11-27 05:33:54,393 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3750540.0, ans=0.125 2023-11-27 05:33:56,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3750540.0, ans=0.1 2023-11-27 05:33:56,483 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3750540.0, ans=0.1 2023-11-27 05:33:57,039 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.88 vs. limit=15.0 2023-11-27 05:33:58,725 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-27 05:34:09,399 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3750606.6666666665, ans=0.1 2023-11-27 05:34:10,314 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 562600 2023-11-27 05:34:10,522 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3750606.6666666665, ans=0.2 2023-11-27 05:34:10,523 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3750606.6666666665, ans=0.0 2023-11-27 05:34:11,785 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3750606.6666666665, ans=0.125 2023-11-27 05:34:13,606 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 9500, loss[loss=0.06772, simple_loss=0.09358, pruned_loss=0.01324, audio_tagging_loss=0.007689, over 15289.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08905, pruned_loss=0.01201, audio_tagging_loss=0.008771, over 3045495.33 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 8.0 2023-11-27 05:34:15,165 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.54 vs. limit=15.0 2023-11-27 05:34:42,867 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.626e+01 8.981e+01 9.517e+01 1.036e+02 1.547e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-27 05:35:02,293 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3750940.0, ans=0.2 2023-11-27 05:35:05,380 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 562650 2023-11-27 05:35:08,580 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 9550, loss[loss=0.05588, simple_loss=0.07542, pruned_loss=0.008083, audio_tagging_loss=0.01009, over 14900.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08893, pruned_loss=0.01196, audio_tagging_loss=0.00884, over 3045965.76 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 8.0 2023-11-27 05:36:02,671 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 562700 2023-11-27 05:36:05,788 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 9600, loss[loss=0.07409, simple_loss=0.09618, pruned_loss=0.01518, audio_tagging_loss=0.01083, over 15504.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08858, pruned_loss=0.01194, audio_tagging_loss=0.008914, over 3053108.25 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:36:05,989 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3751340.0, ans=0.1 2023-11-27 05:36:18,761 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3751406.6666666665, ans=0.0 2023-11-27 05:36:19,825 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3751406.6666666665, ans=0.125 2023-11-27 05:36:28,123 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3751473.3333333335, ans=0.5 2023-11-27 05:36:30,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3751473.3333333335, ans=0.1 2023-11-27 05:36:33,686 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.325e+01 9.083e+01 9.744e+01 1.053e+02 1.282e+02, threshold=1.949e+02, percent-clipped=0.0 2023-11-27 05:36:39,764 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3751540.0, ans=0.125 2023-11-27 05:36:50,380 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3751606.6666666665, ans=0.09899494936611666 2023-11-27 05:36:57,632 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 562750 2023-11-27 05:37:00,761 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 9650, loss[loss=0.05282, simple_loss=0.07001, pruned_loss=0.009755, audio_tagging_loss=0.008061, over 14664.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.08886, pruned_loss=0.01201, audio_tagging_loss=0.00894, over 3047909.38 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:37:00,920 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3751673.3333333335, ans=0.125 2023-11-27 05:37:06,202 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3751673.3333333335, ans=0.0 2023-11-27 05:37:17,319 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_na.min_abs, batch_count=3751740.0, ans=0.02 2023-11-27 05:37:19,296 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3751740.0, ans=0.0 2023-11-27 05:37:23,073 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3751806.6666666665, ans=0.0 2023-11-27 05:37:43,315 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3751873.3333333335, ans=0.125 2023-11-27 05:37:52,802 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 562800 2023-11-27 05:37:56,165 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 9700, loss[loss=0.0569, simple_loss=0.08514, pruned_loss=0.007716, audio_tagging_loss=0.00662, over 15489.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.08918, pruned_loss=0.01188, audio_tagging_loss=0.008694, over 3046207.83 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:38:02,261 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3752006.6666666665, ans=0.05 2023-11-27 05:38:22,181 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3752140.0, ans=0.0 2023-11-27 05:38:25,189 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.533e+01 8.968e+01 9.559e+01 1.024e+02 1.547e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-27 05:38:25,473 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3752140.0, ans=0.0 2023-11-27 05:38:48,484 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 562850 2023-11-27 05:38:50,856 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.42 vs. limit=5.0 2023-11-27 05:38:52,153 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 9750, loss[loss=0.06182, simple_loss=0.08889, pruned_loss=0.007725, audio_tagging_loss=0.009651, over 15186.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08977, pruned_loss=0.01198, audio_tagging_loss=0.008517, over 3041029.06 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:38:58,664 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3752340.0, ans=0.2 2023-11-27 05:39:20,973 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3752473.3333333335, ans=0.0 2023-11-27 05:39:20,994 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3752473.3333333335, ans=0.125 2023-11-27 05:39:26,695 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3752540.0, ans=0.0 2023-11-27 05:39:37,691 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3752606.6666666665, ans=0.125 2023-11-27 05:39:44,007 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 562900 2023-11-27 05:39:47,163 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 9800, loss[loss=0.08447, simple_loss=0.1126, pruned_loss=0.01976, audio_tagging_loss=0.008398, over 15925.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.09053, pruned_loss=0.01225, audio_tagging_loss=0.008472, over 3043287.64 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 8.0 2023-11-27 05:39:50,894 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.67 vs. limit=15.0 2023-11-27 05:39:58,035 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3752740.0, ans=0.1 2023-11-27 05:40:06,861 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3752740.0, ans=0.0 2023-11-27 05:40:17,820 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.921e+01 8.828e+01 9.556e+01 1.035e+02 1.179e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-27 05:40:31,686 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3752940.0, ans=0.0 2023-11-27 05:40:36,816 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 05:40:39,071 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 562950 2023-11-27 05:40:39,455 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.96 vs. limit=15.0 2023-11-27 05:40:40,256 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3752940.0, ans=0.0 2023-11-27 05:40:42,205 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 9850, loss[loss=0.07005, simple_loss=0.1006, pruned_loss=0.01147, audio_tagging_loss=0.008269, over 15517.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.09064, pruned_loss=0.0122, audio_tagging_loss=0.008503, over 3043114.43 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 8.0 2023-11-27 05:40:50,805 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.52 vs. limit=15.0 2023-11-27 05:40:51,417 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3753006.6666666665, ans=0.125 2023-11-27 05:40:52,543 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3753073.3333333335, ans=0.125 2023-11-27 05:40:52,570 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3753073.3333333335, ans=0.125 2023-11-27 05:40:53,922 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.20 vs. limit=22.5 2023-11-27 05:40:58,752 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3753073.3333333335, ans=0.125 2023-11-27 05:41:00,020 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3753073.3333333335, ans=0.0 2023-11-27 05:41:22,500 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3753206.6666666665, ans=0.125 2023-11-27 05:41:34,381 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 563000 2023-11-27 05:41:38,593 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 9900, loss[loss=0.06008, simple_loss=0.08551, pruned_loss=0.009239, audio_tagging_loss=0.008084, over 15227.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.09028, pruned_loss=0.0121, audio_tagging_loss=0.00854, over 3042669.13 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 8.0 2023-11-27 05:41:48,083 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.39 vs. limit=15.0 2023-11-27 05:42:07,505 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.516e+01 9.022e+01 9.450e+01 1.050e+02 2.788e+02, threshold=1.890e+02, percent-clipped=1.0 2023-11-27 05:42:28,101 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3753606.6666666665, ans=0.1 2023-11-27 05:42:31,035 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 563050 2023-11-27 05:42:34,155 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 9950, loss[loss=0.07419, simple_loss=0.1078, pruned_loss=0.01426, audio_tagging_loss=0.006055, over 15356.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.09063, pruned_loss=0.01219, audio_tagging_loss=0.008506, over 3045366.27 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 8.0 2023-11-27 05:42:41,738 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3753673.3333333335, ans=0.0 2023-11-27 05:43:02,126 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.72 vs. limit=15.0 2023-11-27 05:43:05,555 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3753806.6666666665, ans=0.0 2023-11-27 05:43:17,898 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3753940.0, ans=0.125 2023-11-27 05:43:25,377 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3753940.0, ans=0.125 2023-11-27 05:43:25,389 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3753940.0, ans=0.2 2023-11-27 05:43:26,176 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 563100 2023-11-27 05:43:29,294 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 10000, loss[loss=0.07541, simple_loss=0.1003, pruned_loss=0.01777, audio_tagging_loss=0.007471, over 14852.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.09109, pruned_loss=0.01219, audio_tagging_loss=0.008494, over 3042085.31 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:43:34,892 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3754006.6666666665, ans=0.125 2023-11-27 05:43:42,721 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3754073.3333333335, ans=0.07 2023-11-27 05:43:59,986 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.550e+01 8.953e+01 9.636e+01 1.038e+02 1.515e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-27 05:44:12,042 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3754206.6666666665, ans=0.125 2023-11-27 05:44:13,081 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3754273.3333333335, ans=0.07 2023-11-27 05:44:16,287 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3754273.3333333335, ans=0.025 2023-11-27 05:44:21,988 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 563150 2023-11-27 05:44:25,015 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 10050, loss[loss=0.04727, simple_loss=0.05824, pruned_loss=0.009265, audio_tagging_loss=0.008886, over 15795.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.09019, pruned_loss=0.01206, audio_tagging_loss=0.008522, over 3048004.03 frames. ], batch size: 61, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:44:40,781 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3754406.6666666665, ans=0.05 2023-11-27 05:44:42,772 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3754406.6666666665, ans=0.0 2023-11-27 05:44:44,374 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.37 vs. limit=15.0 2023-11-27 05:44:46,002 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3754406.6666666665, ans=0.125 2023-11-27 05:44:52,411 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3754473.3333333335, ans=0.2 2023-11-27 05:45:01,456 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3754540.0, ans=0.0 2023-11-27 05:45:06,019 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.92 vs. limit=22.5 2023-11-27 05:45:18,014 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 563200 2023-11-27 05:45:21,477 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 10100, loss[loss=0.06565, simple_loss=0.09243, pruned_loss=0.01086, audio_tagging_loss=0.008576, over 15900.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.09043, pruned_loss=0.01196, audio_tagging_loss=0.008531, over 3056273.00 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:45:42,382 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 05:45:46,330 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.97 vs. limit=10.0 2023-11-27 05:45:51,644 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.566e+01 8.906e+01 9.588e+01 1.046e+02 1.335e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-27 05:45:52,474 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.04 vs. limit=6.0 2023-11-27 05:45:55,561 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3754873.3333333335, ans=0.125 2023-11-27 05:45:55,619 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3754873.3333333335, ans=0.0 2023-11-27 05:46:02,318 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.71 vs. limit=15.0 2023-11-27 05:46:05,973 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 05:46:08,792 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3754940.0, ans=0.0 2023-11-27 05:46:13,983 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 563250 2023-11-27 05:46:17,010 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 10150, loss[loss=0.06818, simple_loss=0.09123, pruned_loss=0.01362, audio_tagging_loss=0.008947, over 14824.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.09092, pruned_loss=0.01216, audio_tagging_loss=0.008636, over 3056345.78 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:46:29,452 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3755073.3333333335, ans=0.125 2023-11-27 05:46:33,508 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.29 vs. limit=22.5 2023-11-27 05:46:38,828 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3755140.0, ans=0.0 2023-11-27 05:46:42,900 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 05:46:44,119 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3755140.0, ans=0.2 2023-11-27 05:46:48,495 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3755140.0, ans=0.05 2023-11-27 05:46:50,611 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3755206.6666666665, ans=0.1 2023-11-27 05:46:52,068 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3755206.6666666665, ans=0.125 2023-11-27 05:46:55,402 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3755206.6666666665, ans=0.0 2023-11-27 05:47:09,464 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 563300 2023-11-27 05:47:12,622 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 10200, loss[loss=0.05424, simple_loss=0.06906, pruned_loss=0.01115, audio_tagging_loss=0.008565, over 15284.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.09138, pruned_loss=0.0121, audio_tagging_loss=0.008626, over 3057207.40 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:47:15,546 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3755340.0, ans=0.125 2023-11-27 05:47:32,903 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 05:47:42,392 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.16 vs. limit=15.0 2023-11-27 05:47:42,915 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.968e+01 9.089e+01 9.735e+01 1.032e+02 1.277e+02, threshold=1.947e+02, percent-clipped=0.0 2023-11-27 05:48:05,484 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.47 vs. limit=12.0 2023-11-27 05:48:05,880 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 563350 2023-11-27 05:48:08,974 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 10250, loss[loss=0.07837, simple_loss=0.105, pruned_loss=0.01655, audio_tagging_loss=0.009312, over 16012.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.09075, pruned_loss=0.01211, audio_tagging_loss=0.008635, over 3059800.35 frames. ], batch size: 61, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:48:20,235 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3755740.0, ans=0.1 2023-11-27 05:48:40,637 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3755806.6666666665, ans=0.0 2023-11-27 05:48:50,543 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3755873.3333333335, ans=0.0 2023-11-27 05:48:58,142 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.67 vs. limit=15.0 2023-11-27 05:49:01,499 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 563400 2023-11-27 05:49:01,671 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3755940.0, ans=0.0 2023-11-27 05:49:04,800 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 10300, loss[loss=0.08402, simple_loss=0.111, pruned_loss=0.01903, audio_tagging_loss=0.009475, over 14546.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.08883, pruned_loss=0.01189, audio_tagging_loss=0.008854, over 3057770.84 frames. ], batch size: 54, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:49:06,045 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3756006.6666666665, ans=0.125 2023-11-27 05:49:19,286 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3756073.3333333335, ans=0.125 2023-11-27 05:49:27,739 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3756140.0, ans=0.0 2023-11-27 05:49:28,764 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 05:49:32,994 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3756140.0, ans=0.0 2023-11-27 05:49:34,848 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.807e+01 9.184e+01 9.721e+01 1.033e+02 1.459e+02, threshold=1.944e+02, percent-clipped=0.0 2023-11-27 05:49:40,320 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3756206.6666666665, ans=0.125 2023-11-27 05:49:51,483 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3756273.3333333335, ans=0.0 2023-11-27 05:49:56,096 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.07 vs. limit=15.0 2023-11-27 05:49:56,712 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 563450 2023-11-27 05:50:00,361 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 10350, loss[loss=0.0581, simple_loss=0.07524, pruned_loss=0.009671, audio_tagging_loss=0.01081, over 15929.00 frames. ], tot_loss[loss=0.06481, simple_loss=0.08814, pruned_loss=0.01184, audio_tagging_loss=0.008903, over 3048519.21 frames. ], batch size: 60, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:50:00,644 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3756340.0, ans=0.125 2023-11-27 05:50:02,694 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3756340.0, ans=0.0 2023-11-27 05:50:21,629 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3756473.3333333335, ans=0.0 2023-11-27 05:50:28,461 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3756473.3333333335, ans=0.125 2023-11-27 05:50:30,608 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3756473.3333333335, ans=0.125 2023-11-27 05:50:52,371 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3756606.6666666665, ans=0.125 2023-11-27 05:50:53,249 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 563500 2023-11-27 05:50:56,316 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 10400, loss[loss=0.04871, simple_loss=0.06664, pruned_loss=0.008304, audio_tagging_loss=0.007092, over 13892.00 frames. ], tot_loss[loss=0.06459, simple_loss=0.08751, pruned_loss=0.01188, audio_tagging_loss=0.008953, over 3050077.68 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:51:05,478 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.16 vs. limit=15.0 2023-11-27 05:51:09,317 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3756740.0, ans=0.125 2023-11-27 05:51:26,041 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.687e+01 8.939e+01 9.704e+01 1.060e+02 1.471e+02, threshold=1.941e+02, percent-clipped=0.0 2023-11-27 05:51:36,884 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3756873.3333333335, ans=0.0 2023-11-27 05:51:48,482 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 563550 2023-11-27 05:51:50,888 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.83 vs. limit=15.0 2023-11-27 05:51:51,584 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 10450, loss[loss=0.0773, simple_loss=0.1043, pruned_loss=0.01551, audio_tagging_loss=0.00963, over 17012.00 frames. ], tot_loss[loss=0.06475, simple_loss=0.08775, pruned_loss=0.01192, audio_tagging_loss=0.008946, over 3045006.53 frames. ], batch size: 65, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:51:53,806 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3757006.6666666665, ans=0.2 2023-11-27 05:52:10,913 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3757073.3333333335, ans=0.125 2023-11-27 05:52:19,760 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.45 vs. limit=15.0 2023-11-27 05:52:24,752 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 05:52:30,473 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.56 vs. limit=15.0 2023-11-27 05:52:36,247 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3757273.3333333335, ans=0.125 2023-11-27 05:52:40,070 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3757273.3333333335, ans=0.125 2023-11-27 05:52:41,047 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3757273.3333333335, ans=0.2 2023-11-27 05:52:43,721 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.26 vs. limit=15.0 2023-11-27 05:52:44,129 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 563600 2023-11-27 05:52:47,994 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 10500, loss[loss=0.06132, simple_loss=0.08384, pruned_loss=0.01101, audio_tagging_loss=0.008386, over 15260.00 frames. ], tot_loss[loss=0.06468, simple_loss=0.08777, pruned_loss=0.01201, audio_tagging_loss=0.008787, over 3041618.40 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:53:00,077 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.50 vs. limit=10.0 2023-11-27 05:53:17,363 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.844e+01 9.057e+01 9.549e+01 1.045e+02 1.272e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-27 05:53:34,741 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3757606.6666666665, ans=0.125 2023-11-27 05:53:40,866 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 563650 2023-11-27 05:53:41,073 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3757606.6666666665, ans=0.125 2023-11-27 05:53:41,302 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=3757606.6666666665, ans=6.0 2023-11-27 05:53:42,998 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3757673.3333333335, ans=0.2 2023-11-27 05:53:43,934 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 10550, loss[loss=0.05444, simple_loss=0.06753, pruned_loss=0.01231, audio_tagging_loss=0.008359, over 15051.00 frames. ], tot_loss[loss=0.06476, simple_loss=0.08831, pruned_loss=0.01195, audio_tagging_loss=0.008653, over 3043612.26 frames. ], batch size: 60, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:53:50,373 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 05:53:57,002 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3757740.0, ans=0.125 2023-11-27 05:54:04,309 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3757806.6666666665, ans=0.0 2023-11-27 05:54:36,105 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 563700 2023-11-27 05:54:39,299 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 10600, loss[loss=0.06966, simple_loss=0.1014, pruned_loss=0.01204, audio_tagging_loss=0.006927, over 15667.00 frames. ], tot_loss[loss=0.06494, simple_loss=0.0889, pruned_loss=0.01193, audio_tagging_loss=0.00857, over 3050574.13 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:54:52,719 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 05:55:10,867 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.549e+01 8.995e+01 9.549e+01 1.050e+02 1.253e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-27 05:55:22,680 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3758273.3333333335, ans=0.125 2023-11-27 05:55:31,011 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 563750 2023-11-27 05:55:31,160 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 05:55:34,660 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 10650, loss[loss=0.05528, simple_loss=0.07474, pruned_loss=0.005067, audio_tagging_loss=0.01285, over 16114.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08928, pruned_loss=0.01205, audio_tagging_loss=0.008556, over 3046823.28 frames. ], batch size: 61, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:55:35,925 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3758340.0, ans=0.125 2023-11-27 05:55:44,877 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.34 vs. limit=10.0 2023-11-27 05:55:45,354 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3758406.6666666665, ans=0.125 2023-11-27 05:55:52,410 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3758406.6666666665, ans=0.1 2023-11-27 05:55:53,450 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3758406.6666666665, ans=0.0 2023-11-27 05:56:00,140 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.19 vs. limit=15.0 2023-11-27 05:56:02,901 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3758473.3333333335, ans=0.125 2023-11-27 05:56:07,558 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.43 vs. limit=6.0 2023-11-27 05:56:10,273 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3758540.0, ans=0.125 2023-11-27 05:56:14,480 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3758540.0, ans=0.2 2023-11-27 05:56:15,504 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3758540.0, ans=0.0 2023-11-27 05:56:17,272 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3758540.0, ans=0.125 2023-11-27 05:56:25,025 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3758606.6666666665, ans=0.0 2023-11-27 05:56:27,484 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 563800 2023-11-27 05:56:31,173 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 10700, loss[loss=0.05785, simple_loss=0.07181, pruned_loss=0.01074, audio_tagging_loss=0.0112, over 15594.00 frames. ], tot_loss[loss=0.06481, simple_loss=0.08885, pruned_loss=0.01185, audio_tagging_loss=0.008527, over 3054805.47 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:56:31,421 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3758673.3333333335, ans=0.125 2023-11-27 05:56:34,649 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3758673.3333333335, ans=0.125 2023-11-27 05:56:42,064 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3758740.0, ans=0.1 2023-11-27 05:56:58,723 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3758806.6666666665, ans=0.95 2023-11-27 05:57:01,231 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.230e+01 8.877e+01 9.430e+01 1.025e+02 1.516e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-27 05:57:18,729 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.34 vs. limit=22.5 2023-11-27 05:57:22,457 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 563850 2023-11-27 05:57:25,454 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 10750, loss[loss=0.06784, simple_loss=0.08701, pruned_loss=0.01498, audio_tagging_loss=0.009351, over 15842.00 frames. ], tot_loss[loss=0.06509, simple_loss=0.08931, pruned_loss=0.01191, audio_tagging_loss=0.008524, over 3055597.99 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:57:34,204 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3759006.6666666665, ans=0.125 2023-11-27 05:57:35,602 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.75 vs. limit=15.0 2023-11-27 05:57:38,512 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3759073.3333333335, ans=0.1 2023-11-27 05:57:51,805 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3759140.0, ans=0.125 2023-11-27 05:58:11,572 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3759273.3333333335, ans=0.0 2023-11-27 05:58:17,679 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 563900 2023-11-27 05:58:20,827 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 10800, loss[loss=0.06459, simple_loss=0.08458, pruned_loss=0.01121, audio_tagging_loss=0.01109, over 14964.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.08961, pruned_loss=0.01208, audio_tagging_loss=0.00853, over 3054076.20 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:58:21,519 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.48 vs. limit=15.0 2023-11-27 05:58:22,079 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 05:58:48,689 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3759473.3333333335, ans=0.2 2023-11-27 05:58:52,700 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.507e+01 9.069e+01 9.752e+01 1.037e+02 1.652e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-27 05:59:14,066 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 563950 2023-11-27 05:59:17,796 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 10850, loss[loss=0.07333, simple_loss=0.1072, pruned_loss=0.01266, audio_tagging_loss=0.007063, over 15890.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.0897, pruned_loss=0.0121, audio_tagging_loss=0.008462, over 3054990.33 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:59:24,944 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3759673.3333333335, ans=0.125 2023-11-27 05:59:34,963 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.77 vs. limit=15.0 2023-11-27 06:00:05,617 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.19 vs. limit=6.0 2023-11-27 06:00:10,328 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 06:00:10,392 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 564000 2023-11-27 06:00:11,709 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-564000.pt 2023-11-27 06:00:15,712 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 10900, loss[loss=0.07407, simple_loss=0.09597, pruned_loss=0.01691, audio_tagging_loss=0.009179, over 15269.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08997, pruned_loss=0.01219, audio_tagging_loss=0.0085, over 3057056.56 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 06:00:17,880 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3760006.6666666665, ans=0.125 2023-11-27 06:00:47,455 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.724e+01 8.933e+01 9.704e+01 1.050e+02 1.255e+02, threshold=1.941e+02, percent-clipped=0.0 2023-11-27 06:00:47,782 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3760140.0, ans=0.025 2023-11-27 06:00:51,888 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3760206.6666666665, ans=0.0 2023-11-27 06:00:55,982 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3760206.6666666665, ans=0.1 2023-11-27 06:01:03,709 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.93 vs. limit=22.5 2023-11-27 06:01:06,910 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.98 vs. limit=15.0 2023-11-27 06:01:07,494 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 564050 2023-11-27 06:01:10,566 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 10950, loss[loss=0.07033, simple_loss=0.09729, pruned_loss=0.01472, audio_tagging_loss=0.006964, over 15378.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08929, pruned_loss=0.01203, audio_tagging_loss=0.008584, over 3056931.05 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:01:41,731 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3760473.3333333335, ans=0.0 2023-11-27 06:01:50,300 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.45 vs. limit=6.0 2023-11-27 06:01:53,285 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3760540.0, ans=0.125 2023-11-27 06:02:03,197 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 564100 2023-11-27 06:02:06,787 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 11000, loss[loss=0.06117, simple_loss=0.0835, pruned_loss=0.01095, audio_tagging_loss=0.008465, over 16403.00 frames. ], tot_loss[loss=0.06502, simple_loss=0.08872, pruned_loss=0.01202, audio_tagging_loss=0.008644, over 3058562.23 frames. ], batch size: 62, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:02:15,274 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 06:02:15,877 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.02 vs. limit=22.5 2023-11-27 06:02:16,626 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3760673.3333333335, ans=0.04949747468305833 2023-11-27 06:02:20,875 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3760740.0, ans=0.125 2023-11-27 06:02:24,102 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3760740.0, ans=0.125 2023-11-27 06:02:28,303 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3760806.6666666665, ans=0.0 2023-11-27 06:02:38,495 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.912e+01 8.834e+01 9.840e+01 1.033e+02 1.285e+02, threshold=1.968e+02, percent-clipped=0.0 2023-11-27 06:02:38,808 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3760873.3333333335, ans=0.1 2023-11-27 06:02:54,585 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3760940.0, ans=0.125 2023-11-27 06:02:59,698 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 564150 2023-11-27 06:03:02,106 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3761006.6666666665, ans=0.125 2023-11-27 06:03:02,826 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 11050, loss[loss=0.06037, simple_loss=0.0747, pruned_loss=0.01367, audio_tagging_loss=0.009353, over 14433.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.0892, pruned_loss=0.01206, audio_tagging_loss=0.008697, over 3060668.73 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:03:07,579 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.65 vs. limit=15.0 2023-11-27 06:03:29,847 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3761140.0, ans=0.1 2023-11-27 06:03:47,907 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3761273.3333333335, ans=0.125 2023-11-27 06:03:53,410 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.98 vs. limit=15.0 2023-11-27 06:03:54,083 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 564200 2023-11-27 06:03:57,491 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 11100, loss[loss=0.0931, simple_loss=0.1345, pruned_loss=0.01864, audio_tagging_loss=0.00722, over 15189.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.0889, pruned_loss=0.0121, audio_tagging_loss=0.008774, over 3058218.98 frames. ], batch size: 53, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:04:30,089 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.469e+01 9.171e+01 9.681e+01 1.044e+02 1.229e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-27 06:04:30,366 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3761540.0, ans=0.1 2023-11-27 06:04:35,613 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3761540.0, ans=0.125 2023-11-27 06:04:49,833 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 564250 2023-11-27 06:04:52,950 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 11150, loss[loss=0.06808, simple_loss=0.09962, pruned_loss=0.01133, audio_tagging_loss=0.006941, over 16469.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08916, pruned_loss=0.0121, audio_tagging_loss=0.008829, over 3058903.91 frames. ], batch size: 62, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:04:56,066 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.31 vs. limit=10.0 2023-11-27 06:05:21,354 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3761806.6666666665, ans=0.125 2023-11-27 06:05:25,476 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3761873.3333333335, ans=0.125 2023-11-27 06:05:28,626 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3761873.3333333335, ans=0.0 2023-11-27 06:05:29,983 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.87 vs. limit=22.5 2023-11-27 06:05:46,549 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 564300 2023-11-27 06:05:49,604 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 11200, loss[loss=0.06081, simple_loss=0.08669, pruned_loss=0.008287, audio_tagging_loss=0.00918, over 16042.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.08943, pruned_loss=0.01216, audio_tagging_loss=0.008906, over 3062899.64 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 06:06:13,620 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3762140.0, ans=0.0 2023-11-27 06:06:20,555 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3762140.0, ans=0.125 2023-11-27 06:06:22,458 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.923e+01 9.248e+01 9.841e+01 1.044e+02 1.353e+02, threshold=1.968e+02, percent-clipped=0.0 2023-11-27 06:06:27,625 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3762206.6666666665, ans=0.125 2023-11-27 06:06:37,758 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3762273.3333333335, ans=0.125 2023-11-27 06:06:41,702 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 564350 2023-11-27 06:06:44,804 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 11250, loss[loss=0.05699, simple_loss=0.07401, pruned_loss=0.01168, audio_tagging_loss=0.008298, over 14824.00 frames. ], tot_loss[loss=0.06485, simple_loss=0.08802, pruned_loss=0.01192, audio_tagging_loss=0.008923, over 3064632.38 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:06:53,489 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3762340.0, ans=0.0 2023-11-27 06:06:54,556 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3762406.6666666665, ans=0.2 2023-11-27 06:06:56,594 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.88 vs. limit=10.0 2023-11-27 06:06:57,079 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3762406.6666666665, ans=10.0 2023-11-27 06:07:09,792 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3762473.3333333335, ans=0.125 2023-11-27 06:07:29,450 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3762606.6666666665, ans=0.1 2023-11-27 06:07:36,577 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 564400 2023-11-27 06:07:36,791 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3762606.6666666665, ans=0.125 2023-11-27 06:07:40,540 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 11300, loss[loss=0.06106, simple_loss=0.08475, pruned_loss=0.01011, audio_tagging_loss=0.008575, over 14320.00 frames. ], tot_loss[loss=0.06494, simple_loss=0.08862, pruned_loss=0.0119, audio_tagging_loss=0.008723, over 3058671.28 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:07:49,758 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3762673.3333333335, ans=0.125 2023-11-27 06:07:59,149 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 06:08:13,714 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.519e+01 9.028e+01 9.588e+01 1.026e+02 1.216e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-27 06:08:21,965 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3762873.3333333335, ans=0.1 2023-11-27 06:08:27,810 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3762940.0, ans=0.125 2023-11-27 06:08:33,477 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 564450 2023-11-27 06:08:36,676 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 11350, loss[loss=0.0495, simple_loss=0.0646, pruned_loss=0.009303, audio_tagging_loss=0.007899, over 14436.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08962, pruned_loss=0.0121, audio_tagging_loss=0.008593, over 3051129.78 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:08:40,026 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3763006.6666666665, ans=0.125 2023-11-27 06:08:48,048 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3763073.3333333335, ans=0.0 2023-11-27 06:08:50,266 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3763073.3333333335, ans=0.5 2023-11-27 06:08:51,349 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3763073.3333333335, ans=0.125 2023-11-27 06:09:18,015 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3763206.6666666665, ans=0.0 2023-11-27 06:09:21,121 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3763273.3333333335, ans=0.0 2023-11-27 06:09:27,935 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3763273.3333333335, ans=0.1 2023-11-27 06:09:28,880 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 564500 2023-11-27 06:09:31,968 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 11400, loss[loss=0.04818, simple_loss=0.05867, pruned_loss=0.008452, audio_tagging_loss=0.01039, over 15246.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.09018, pruned_loss=0.01212, audio_tagging_loss=0.008459, over 3045536.30 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:09:40,666 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3763340.0, ans=0.125 2023-11-27 06:09:46,718 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.44 vs. limit=10.0 2023-11-27 06:10:05,120 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.669e+01 9.070e+01 9.706e+01 1.041e+02 1.301e+02, threshold=1.941e+02, percent-clipped=0.0 2023-11-27 06:10:07,530 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3763540.0, ans=0.0 2023-11-27 06:10:23,652 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 564550 2023-11-27 06:10:27,262 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 11450, loss[loss=0.07071, simple_loss=0.1061, pruned_loss=0.01098, audio_tagging_loss=0.006667, over 14642.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.09066, pruned_loss=0.01211, audio_tagging_loss=0.008393, over 3052716.19 frames. ], batch size: 54, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:10:31,698 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3763673.3333333335, ans=0.125 2023-11-27 06:10:31,745 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3763673.3333333335, ans=0.125 2023-11-27 06:10:42,314 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3763740.0, ans=0.07 2023-11-27 06:10:55,382 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3763806.6666666665, ans=0.125 2023-11-27 06:10:57,481 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3763806.6666666665, ans=0.5 2023-11-27 06:11:19,518 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 564600 2023-11-27 06:11:23,099 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 11500, loss[loss=0.07021, simple_loss=0.09485, pruned_loss=0.01383, audio_tagging_loss=0.008954, over 15261.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.09038, pruned_loss=0.012, audio_tagging_loss=0.00845, over 3054883.82 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:11:39,755 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3764073.3333333335, ans=0.2 2023-11-27 06:11:41,112 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.16 vs. limit=15.0 2023-11-27 06:11:50,872 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3764140.0, ans=0.2 2023-11-27 06:11:51,760 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3764140.0, ans=0.0 2023-11-27 06:11:52,931 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3764140.0, ans=0.0 2023-11-27 06:11:55,857 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.231e+01 8.975e+01 9.589e+01 1.045e+02 1.244e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-27 06:12:11,866 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3764273.3333333335, ans=0.125 2023-11-27 06:12:14,885 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 564650 2023-11-27 06:12:18,037 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 11550, loss[loss=0.06846, simple_loss=0.09648, pruned_loss=0.01324, audio_tagging_loss=0.006975, over 16823.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.09007, pruned_loss=0.01201, audio_tagging_loss=0.008463, over 3055166.14 frames. ], batch size: 62, lr: 1.43e-03, grad_scale: 8.0 2023-11-27 06:12:18,240 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3764340.0, ans=0.125 2023-11-27 06:12:28,395 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3764406.6666666665, ans=0.0 2023-11-27 06:12:29,490 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3764406.6666666665, ans=0.2 2023-11-27 06:12:35,303 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3764406.6666666665, ans=0.2 2023-11-27 06:12:52,045 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 06:13:10,507 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 564700 2023-11-27 06:13:13,679 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 11600, loss[loss=0.08067, simple_loss=0.1078, pruned_loss=0.01638, audio_tagging_loss=0.01039, over 14957.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08968, pruned_loss=0.01199, audio_tagging_loss=0.008421, over 3056247.25 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:13:32,936 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=14.39 vs. limit=15.0 2023-11-27 06:13:39,899 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 06:13:40,945 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3764806.6666666665, ans=0.1 2023-11-27 06:13:43,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3764806.6666666665, ans=0.125 2023-11-27 06:13:46,858 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3764873.3333333335, ans=0.5 2023-11-27 06:13:48,657 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.716e+01 9.097e+01 9.719e+01 1.053e+02 1.388e+02, threshold=1.944e+02, percent-clipped=0.0 2023-11-27 06:13:52,262 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=16.42 vs. limit=15.0 2023-11-27 06:14:06,629 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 564750 2023-11-27 06:14:09,727 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 11650, loss[loss=0.06214, simple_loss=0.08824, pruned_loss=0.009468, audio_tagging_loss=0.008552, over 15637.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.0902, pruned_loss=0.01197, audio_tagging_loss=0.008377, over 3062174.73 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:14:35,260 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3765140.0, ans=0.2 2023-11-27 06:14:48,068 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3765206.6666666665, ans=0.125 2023-11-27 06:14:57,078 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3765273.3333333335, ans=0.0 2023-11-27 06:15:01,171 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3765273.3333333335, ans=0.1 2023-11-27 06:15:02,239 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 564800 2023-11-27 06:15:05,663 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 11700, loss[loss=0.0346, simple_loss=0.04062, pruned_loss=0.004073, audio_tagging_loss=0.01021, over 15969.00 frames. ], tot_loss[loss=0.06484, simple_loss=0.089, pruned_loss=0.01185, audio_tagging_loss=0.008485, over 3054609.21 frames. ], batch size: 65, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:15:27,164 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3765473.3333333335, ans=0.125 2023-11-27 06:15:27,240 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3765473.3333333335, ans=0.125 2023-11-27 06:15:40,830 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.747e+01 8.890e+01 9.642e+01 1.040e+02 1.676e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-27 06:15:41,410 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.62 vs. limit=6.0 2023-11-27 06:15:58,272 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 564850 2023-11-27 06:16:00,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3765673.3333333335, ans=0.125 2023-11-27 06:16:01,363 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 11750, loss[loss=0.05839, simple_loss=0.07745, pruned_loss=0.01018, audio_tagging_loss=0.009476, over 15900.00 frames. ], tot_loss[loss=0.0649, simple_loss=0.08863, pruned_loss=0.01199, audio_tagging_loss=0.008596, over 3057336.02 frames. ], batch size: 62, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:16:52,673 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3765940.0, ans=0.1 2023-11-27 06:16:54,068 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 564900 2023-11-27 06:16:57,159 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 11800, loss[loss=0.08539, simple_loss=0.1189, pruned_loss=0.01658, audio_tagging_loss=0.00938, over 15088.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08922, pruned_loss=0.01213, audio_tagging_loss=0.008552, over 3057768.73 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:17:31,470 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.000e+01 8.890e+01 9.734e+01 1.045e+02 1.276e+02, threshold=1.947e+02, percent-clipped=0.0 2023-11-27 06:17:31,749 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3766206.6666666665, ans=0.125 2023-11-27 06:17:32,750 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3766206.6666666665, ans=0.125 2023-11-27 06:17:40,767 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3766273.3333333335, ans=0.0 2023-11-27 06:17:45,523 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3766273.3333333335, ans=0.0 2023-11-27 06:17:49,604 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 564950 2023-11-27 06:17:52,737 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 11850, loss[loss=0.05357, simple_loss=0.07921, pruned_loss=0.005755, audio_tagging_loss=0.008209, over 15752.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.08933, pruned_loss=0.01218, audio_tagging_loss=0.008577, over 3051125.18 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:18:16,692 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.70 vs. limit=15.0 2023-11-27 06:18:19,929 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff3.min_abs, batch_count=3766473.3333333335, ans=0.2 2023-11-27 06:18:44,767 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 565000 2023-11-27 06:18:48,173 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 11900, loss[loss=0.04618, simple_loss=0.06416, pruned_loss=0.006876, audio_tagging_loss=0.007219, over 15075.00 frames. ], tot_loss[loss=0.06499, simple_loss=0.08896, pruned_loss=0.01185, audio_tagging_loss=0.008663, over 3048632.58 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:19:01,894 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.79 vs. limit=15.0 2023-11-27 06:19:03,291 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.61 vs. limit=5.0 2023-11-27 06:19:04,981 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.21 vs. limit=12.0 2023-11-27 06:19:23,228 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.850e+01 8.781e+01 9.449e+01 1.024e+02 1.260e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-27 06:19:31,023 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff2.min_abs, batch_count=3766873.3333333335, ans=0.1 2023-11-27 06:19:32,074 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3766940.0, ans=0.125 2023-11-27 06:19:34,979 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.64 vs. limit=22.5 2023-11-27 06:19:39,590 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.42 vs. limit=15.0 2023-11-27 06:19:41,258 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 565050 2023-11-27 06:19:44,902 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 11950, loss[loss=0.07466, simple_loss=0.1083, pruned_loss=0.01447, audio_tagging_loss=0.006018, over 14905.00 frames. ], tot_loss[loss=0.06495, simple_loss=0.08896, pruned_loss=0.01173, audio_tagging_loss=0.008743, over 3045706.26 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:19:46,100 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3767006.6666666665, ans=0.125 2023-11-27 06:19:57,616 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.33 vs. limit=15.0 2023-11-27 06:20:06,791 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3767140.0, ans=0.95 2023-11-27 06:20:19,092 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3767206.6666666665, ans=0.04949747468305833 2023-11-27 06:20:24,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3767206.6666666665, ans=0.0 2023-11-27 06:20:27,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3767273.3333333335, ans=0.1 2023-11-27 06:20:29,341 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3767273.3333333335, ans=0.0 2023-11-27 06:20:35,364 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 565100 2023-11-27 06:20:38,327 INFO [train_asr.py:1235] (0/4) Epoch 47, batch 12000, loss[loss=0.06143, simple_loss=0.08062, pruned_loss=0.009969, audio_tagging_loss=0.01115, over 15745.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08918, pruned_loss=0.01178, audio_tagging_loss=0.008904, over 3044526.41 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 06:20:38,329 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-27 06:20:53,484 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.5384, 3.1499, 4.8407, 3.0845], device='cuda:0') 2023-11-27 06:21:10,516 INFO [train_asr.py:1267] (0/4) Epoch 47, validation: loss=0.0578, simple_loss=0.05045, pruned_loss=0.005285, audio_tagging_loss=0.02729, over 4681554.00 frames. 2023-11-27 06:21:10,516 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-27 06:21:17,831 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3767340.0, ans=0.0 2023-11-27 06:21:24,185 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.23 vs. limit=15.0 2023-11-27 06:21:34,791 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/epoch-47.pt 2023-11-27 06:22:01,359 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3767493.3333333335, ans=0.125 2023-11-27 06:22:02,637 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 0, loss[loss=0.06702, simple_loss=0.08664, pruned_loss=0.004589, audio_tagging_loss=0.01911, over 15082.00 frames. ], tot_loss[loss=0.06702, simple_loss=0.08664, pruned_loss=0.004589, audio_tagging_loss=0.01911, over 15082.00 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 06:22:02,639 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-27 06:22:20,239 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.1075, 2.4172, 5.0403, 2.9831], device='cuda:0') 2023-11-27 06:22:25,530 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.4576, 3.8171, 3.1385, 3.7915], device='cuda:0') 2023-11-27 06:22:30,758 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([3.9616, 3.1196, 2.9071, 3.1810, 3.3726, 2.7228, 3.4024, 2.5269], device='cuda:0') 2023-11-27 06:22:32,214 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.1697, 2.7607, 1.7184, 2.6480, 3.2752, 3.2456, 3.2239, 3.5189], device='cuda:0') 2023-11-27 06:22:33,987 INFO [train_asr.py:1267] (0/4) Epoch 48, validation: loss=0.05791, simple_loss=0.05045, pruned_loss=0.005281, audio_tagging_loss=0.02741, over 4681554.00 frames. 2023-11-27 06:22:33,987 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-27 06:22:40,008 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3767493.3333333335, ans=0.0 2023-11-27 06:22:43,062 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.08 vs. limit=22.5 2023-11-27 06:22:43,369 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.387e+01 9.223e+01 9.944e+01 1.084e+02 1.467e+02, threshold=1.989e+02, percent-clipped=0.0 2023-11-27 06:22:46,017 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.92 vs. limit=6.0 2023-11-27 06:22:52,647 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3767560.0, ans=0.125 2023-11-27 06:23:00,865 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 565150 2023-11-27 06:23:30,011 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 50, loss[loss=0.08107, simple_loss=0.1008, pruned_loss=0.01588, audio_tagging_loss=0.01481, over 15848.00 frames. ], tot_loss[loss=0.0737, simple_loss=0.09008, pruned_loss=0.01189, audio_tagging_loss=0.01677, over 683632.66 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:23:39,260 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3767826.6666666665, ans=0.0 2023-11-27 06:23:50,848 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3767960.0, ans=0.125 2023-11-27 06:23:56,512 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 565200 2023-11-27 06:24:02,272 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3768026.6666666665, ans=0.0 2023-11-27 06:24:20,909 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3768093.3333333335, ans=0.0 2023-11-27 06:24:24,118 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3768093.3333333335, ans=0.0 2023-11-27 06:24:26,095 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 100, loss[loss=0.07472, simple_loss=0.09927, pruned_loss=0.01137, audio_tagging_loss=0.01372, over 14998.00 frames. ], tot_loss[loss=0.07321, simple_loss=0.08994, pruned_loss=0.01223, audio_tagging_loss=0.016, over 1208858.82 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:24:27,338 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3768160.0, ans=0.125 2023-11-27 06:24:27,478 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3768160.0, ans=0.125 2023-11-27 06:24:33,148 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.17 vs. limit=10.0 2023-11-27 06:24:35,730 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.383e+01 1.012e+02 1.072e+02 1.151e+02 1.382e+02, threshold=2.144e+02, percent-clipped=0.0 2023-11-27 06:24:48,915 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.16 vs. limit=12.0 2023-11-27 06:24:52,368 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3768293.3333333335, ans=0.09899494936611666 2023-11-27 06:24:52,382 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3768293.3333333335, ans=0.125 2023-11-27 06:24:53,260 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 565250 2023-11-27 06:25:03,570 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3768360.0, ans=0.0 2023-11-27 06:25:13,016 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3768426.6666666665, ans=0.125 2023-11-27 06:25:21,931 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 150, loss[loss=0.07294, simple_loss=0.09698, pruned_loss=0.01541, audio_tagging_loss=0.009042, over 16077.00 frames. ], tot_loss[loss=0.07061, simple_loss=0.08866, pruned_loss=0.01196, audio_tagging_loss=0.01432, over 1614062.24 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:25:25,238 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3768493.3333333335, ans=0.1 2023-11-27 06:25:28,337 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3768493.3333333335, ans=0.1 2023-11-27 06:25:32,121 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3768560.0, ans=0.2 2023-11-27 06:25:35,876 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3768560.0, ans=0.0 2023-11-27 06:25:40,788 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3768560.0, ans=0.125 2023-11-27 06:25:49,140 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 565300 2023-11-27 06:25:51,941 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.88 vs. limit=15.0 2023-11-27 06:26:05,885 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3768760.0, ans=0.1 2023-11-27 06:26:12,507 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.89 vs. limit=15.0 2023-11-27 06:26:18,036 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 200, loss[loss=0.06642, simple_loss=0.09341, pruned_loss=0.01296, audio_tagging_loss=0.006749, over 15395.00 frames. ], tot_loss[loss=0.06832, simple_loss=0.08751, pruned_loss=0.01177, audio_tagging_loss=0.01279, over 1933361.88 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:26:28,099 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.314e+01 9.239e+01 9.831e+01 1.046e+02 1.283e+02, threshold=1.966e+02, percent-clipped=0.0 2023-11-27 06:26:31,560 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3768893.3333333335, ans=0.0 2023-11-27 06:26:39,124 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3768960.0, ans=0.125 2023-11-27 06:26:44,276 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 565350 2023-11-27 06:26:50,368 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.02 vs. limit=6.0 2023-11-27 06:27:05,572 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3769093.3333333335, ans=0.2 2023-11-27 06:27:05,671 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.whiten.whitening_limit, batch_count=3769093.3333333335, ans=12.0 2023-11-27 06:27:06,468 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3769093.3333333335, ans=0.1 2023-11-27 06:27:10,144 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.72 vs. limit=10.0 2023-11-27 06:27:13,819 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 250, loss[loss=0.06727, simple_loss=0.08821, pruned_loss=0.01275, audio_tagging_loss=0.01041, over 15515.00 frames. ], tot_loss[loss=0.06793, simple_loss=0.08876, pruned_loss=0.01197, audio_tagging_loss=0.01158, over 2173542.11 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:27:16,158 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3769160.0, ans=0.0 2023-11-27 06:27:17,035 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3769160.0, ans=0.1 2023-11-27 06:27:40,234 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 565400 2023-11-27 06:27:48,650 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.17 vs. limit=12.0 2023-11-27 06:27:56,900 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3769360.0, ans=0.1 2023-11-27 06:28:05,478 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3769426.6666666665, ans=0.125 2023-11-27 06:28:09,442 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 300, loss[loss=0.05938, simple_loss=0.09143, pruned_loss=0.006229, audio_tagging_loss=0.007437, over 14961.00 frames. ], tot_loss[loss=0.06718, simple_loss=0.08906, pruned_loss=0.01192, audio_tagging_loss=0.01073, over 2360645.66 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 06:28:20,485 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.290e+01 9.040e+01 9.670e+01 1.035e+02 1.237e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-27 06:28:33,037 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3769626.6666666665, ans=0.125 2023-11-27 06:28:37,041 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 565450 2023-11-27 06:29:05,745 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 350, loss[loss=0.07452, simple_loss=0.1052, pruned_loss=0.01237, audio_tagging_loss=0.009542, over 14792.00 frames. ], tot_loss[loss=0.06714, simple_loss=0.08982, pruned_loss=0.01208, audio_tagging_loss=0.01015, over 2519170.44 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 06:29:05,963 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3769826.6666666665, ans=0.1 2023-11-27 06:29:32,232 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 565500 2023-11-27 06:29:42,411 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3770026.6666666665, ans=0.2 2023-11-27 06:29:53,144 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3770093.3333333335, ans=0.125 2023-11-27 06:29:54,238 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3770093.3333333335, ans=0.0 2023-11-27 06:30:01,383 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 400, loss[loss=0.04613, simple_loss=0.06051, pruned_loss=0.006731, audio_tagging_loss=0.009148, over 15080.00 frames. ], tot_loss[loss=0.0664, simple_loss=0.08919, pruned_loss=0.01198, audio_tagging_loss=0.009821, over 2636625.81 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:30:04,714 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3770160.0, ans=0.125 2023-11-27 06:30:08,416 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.20 vs. limit=15.0 2023-11-27 06:30:11,977 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.578e+01 8.851e+01 9.492e+01 1.029e+02 1.198e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-27 06:30:27,506 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 565550 2023-11-27 06:30:30,397 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3770293.3333333335, ans=0.125 2023-11-27 06:30:56,995 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 450, loss[loss=0.04399, simple_loss=0.06175, pruned_loss=0.004984, audio_tagging_loss=0.008128, over 15804.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.08975, pruned_loss=0.01217, audio_tagging_loss=0.009438, over 2728846.55 frames. ], batch size: 60, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:31:24,684 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 565600 2023-11-27 06:31:26,185 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3770626.6666666665, ans=0.125 2023-11-27 06:31:39,950 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3770693.3333333335, ans=0.0 2023-11-27 06:31:44,822 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3770760.0, ans=0.125 2023-11-27 06:31:47,991 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3770760.0, ans=0.05 2023-11-27 06:31:53,045 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 500, loss[loss=0.08, simple_loss=0.1063, pruned_loss=0.01751, audio_tagging_loss=0.009348, over 14405.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.0894, pruned_loss=0.01221, audio_tagging_loss=0.00933, over 2799733.32 frames. ], batch size: 53, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:31:55,877 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3770826.6666666665, ans=0.125 2023-11-27 06:32:00,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3770826.6666666665, ans=0.0 2023-11-27 06:32:05,263 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.449e+01 9.003e+01 9.804e+01 1.033e+02 1.335e+02, threshold=1.961e+02, percent-clipped=0.0 2023-11-27 06:32:08,695 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3770893.3333333335, ans=0.125 2023-11-27 06:32:18,204 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 06:32:20,173 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 565650 2023-11-27 06:32:20,330 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3770960.0, ans=0.125 2023-11-27 06:32:23,558 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3770960.0, ans=0.125 2023-11-27 06:32:32,639 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3771026.6666666665, ans=0.125 2023-11-27 06:32:42,733 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3771093.3333333335, ans=0.125 2023-11-27 06:32:50,024 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 550, loss[loss=0.06518, simple_loss=0.07991, pruned_loss=0.01437, audio_tagging_loss=0.01086, over 14309.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.08891, pruned_loss=0.01206, audio_tagging_loss=0.009097, over 2861160.64 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:32:53,486 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3771160.0, ans=0.125 2023-11-27 06:33:02,395 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.04 vs. limit=22.5 2023-11-27 06:33:16,293 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 565700 2023-11-27 06:33:29,628 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.14 vs. limit=22.5 2023-11-27 06:33:30,219 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3771360.0, ans=0.125 2023-11-27 06:33:32,135 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.16 vs. limit=15.0 2023-11-27 06:33:44,633 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3771493.3333333335, ans=0.125 2023-11-27 06:33:45,510 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 600, loss[loss=0.05941, simple_loss=0.0816, pruned_loss=0.009601, audio_tagging_loss=0.009012, over 15277.00 frames. ], tot_loss[loss=0.06495, simple_loss=0.0883, pruned_loss=0.01178, audio_tagging_loss=0.00902, over 2913123.40 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:33:49,911 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3771493.3333333335, ans=0.125 2023-11-27 06:33:56,619 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.401e+01 8.995e+01 9.614e+01 1.020e+02 1.289e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-27 06:34:11,594 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3771626.6666666665, ans=0.1 2023-11-27 06:34:12,572 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 565750 2023-11-27 06:34:16,368 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3771626.6666666665, ans=0.04949747468305833 2023-11-27 06:34:31,415 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3771760.0, ans=0.0 2023-11-27 06:34:41,268 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 650, loss[loss=0.05963, simple_loss=0.08482, pruned_loss=0.00738, audio_tagging_loss=0.009839, over 14951.00 frames. ], tot_loss[loss=0.06499, simple_loss=0.08821, pruned_loss=0.01181, audio_tagging_loss=0.009072, over 2940545.94 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:34:43,956 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.14 vs. limit=15.0 2023-11-27 06:35:08,423 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 565800 2023-11-27 06:35:25,636 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.61 vs. limit=15.0 2023-11-27 06:35:32,155 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 06:35:36,380 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.82 vs. limit=15.0 2023-11-27 06:35:37,984 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 700, loss[loss=0.06648, simple_loss=0.0929, pruned_loss=0.01123, audio_tagging_loss=0.008799, over 15258.00 frames. ], tot_loss[loss=0.06504, simple_loss=0.08844, pruned_loss=0.01183, audio_tagging_loss=0.008991, over 2965957.75 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:35:47,099 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3772160.0, ans=0.125 2023-11-27 06:35:49,135 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.837e+01 8.967e+01 9.614e+01 1.031e+02 1.404e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-27 06:35:54,700 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3772226.6666666665, ans=0.125 2023-11-27 06:36:04,688 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 565850 2023-11-27 06:36:06,035 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3772293.3333333335, ans=0.1 2023-11-27 06:36:08,008 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3772293.3333333335, ans=0.125 2023-11-27 06:36:11,979 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3772360.0, ans=0.0 2023-11-27 06:36:16,014 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3772360.0, ans=0.125 2023-11-27 06:36:33,925 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 750, loss[loss=0.07481, simple_loss=0.1016, pruned_loss=0.01166, audio_tagging_loss=0.01234, over 15475.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08964, pruned_loss=0.0121, audio_tagging_loss=0.008942, over 2992217.68 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:36:35,090 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3772493.3333333335, ans=0.0 2023-11-27 06:36:48,302 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3772560.0, ans=0.125 2023-11-27 06:36:48,877 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.06 vs. limit=15.0 2023-11-27 06:36:56,777 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.51 vs. limit=15.0 2023-11-27 06:37:01,093 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 565900 2023-11-27 06:37:04,924 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.79 vs. limit=22.5 2023-11-27 06:37:18,809 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3772760.0, ans=0.1 2023-11-27 06:37:19,834 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3772760.0, ans=0.125 2023-11-27 06:37:23,957 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3772760.0, ans=0.1 2023-11-27 06:37:29,623 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 800, loss[loss=0.07772, simple_loss=0.1052, pruned_loss=0.01512, audio_tagging_loss=0.01001, over 15365.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.09048, pruned_loss=0.01224, audio_tagging_loss=0.00894, over 3004120.08 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 06:37:40,738 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.719e+01 9.165e+01 9.801e+01 1.067e+02 1.276e+02, threshold=1.960e+02, percent-clipped=0.0 2023-11-27 06:37:40,947 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 06:37:56,673 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 565950 2023-11-27 06:38:07,069 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3773026.6666666665, ans=0.125 2023-11-27 06:38:13,386 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3773093.3333333335, ans=0.1 2023-11-27 06:38:26,134 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 850, loss[loss=0.05524, simple_loss=0.07487, pruned_loss=0.009271, audio_tagging_loss=0.008534, over 15517.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.09019, pruned_loss=0.01221, audio_tagging_loss=0.008978, over 3011736.23 frames. ], batch size: 63, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:38:52,607 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 566000 2023-11-27 06:38:52,701 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3773293.3333333335, ans=0.125 2023-11-27 06:39:04,127 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3773360.0, ans=0.0 2023-11-27 06:39:05,116 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3773360.0, ans=0.125 2023-11-27 06:39:05,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3773360.0, ans=0.125 2023-11-27 06:39:10,002 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3773426.6666666665, ans=0.0 2023-11-27 06:39:18,992 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=3773426.6666666665, ans=10.0 2023-11-27 06:39:19,856 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3773426.6666666665, ans=0.05 2023-11-27 06:39:21,829 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 900, loss[loss=0.05986, simple_loss=0.08759, pruned_loss=0.009469, audio_tagging_loss=0.006601, over 15598.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.08908, pruned_loss=0.01176, audio_tagging_loss=0.009086, over 3023119.49 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:39:24,779 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3773493.3333333335, ans=0.0 2023-11-27 06:39:29,975 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3773493.3333333335, ans=0.125 2023-11-27 06:39:34,546 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.653e+01 8.912e+01 9.588e+01 1.041e+02 1.300e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-27 06:39:45,904 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3773626.6666666665, ans=0.05 2023-11-27 06:39:49,523 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 566050 2023-11-27 06:39:59,198 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3773693.3333333335, ans=0.0 2023-11-27 06:40:00,194 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3773693.3333333335, ans=0.125 2023-11-27 06:40:12,508 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3773760.0, ans=0.09899494936611666 2023-11-27 06:40:18,023 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 950, loss[loss=0.06077, simple_loss=0.08763, pruned_loss=0.0103, audio_tagging_loss=0.006655, over 15409.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08942, pruned_loss=0.01183, audio_tagging_loss=0.008952, over 3032515.72 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:40:29,526 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3773893.3333333335, ans=0.125 2023-11-27 06:40:32,282 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3773893.3333333335, ans=0.0 2023-11-27 06:40:34,330 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3773893.3333333335, ans=0.125 2023-11-27 06:40:44,807 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 566100 2023-11-27 06:40:46,257 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.69 vs. limit=22.5 2023-11-27 06:40:48,180 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3773960.0, ans=0.1 2023-11-27 06:41:03,263 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 06:41:14,675 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 1000, loss[loss=0.05937, simple_loss=0.07287, pruned_loss=0.01254, audio_tagging_loss=0.0104, over 15266.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.09057, pruned_loss=0.0121, audio_tagging_loss=0.008748, over 3039052.06 frames. ], batch size: 60, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:41:22,293 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3774160.0, ans=0.2 2023-11-27 06:41:26,266 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.762e+01 9.089e+01 9.617e+01 1.045e+02 2.026e+02, threshold=1.923e+02, percent-clipped=1.0 2023-11-27 06:41:30,828 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3774226.6666666665, ans=0.125 2023-11-27 06:41:37,417 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 06:41:37,599 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3774293.3333333335, ans=0.1 2023-11-27 06:41:37,983 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.24 vs. limit=15.0 2023-11-27 06:41:41,187 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 566150 2023-11-27 06:41:52,537 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3774360.0, ans=0.125 2023-11-27 06:42:10,091 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 1050, loss[loss=0.07687, simple_loss=0.1029, pruned_loss=0.01872, audio_tagging_loss=0.006703, over 14373.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08974, pruned_loss=0.01207, audio_tagging_loss=0.008696, over 3036181.40 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:42:11,353 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3774493.3333333335, ans=0.0 2023-11-27 06:42:11,395 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3774493.3333333335, ans=0.125 2023-11-27 06:42:23,849 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3774560.0, ans=0.125 2023-11-27 06:42:27,163 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3774560.0, ans=0.125 2023-11-27 06:42:37,067 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 566200 2023-11-27 06:42:41,163 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.73 vs. limit=10.0 2023-11-27 06:43:01,882 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.41 vs. limit=15.0 2023-11-27 06:43:02,548 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3774760.0, ans=0.125 2023-11-27 06:43:02,677 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3774760.0, ans=0.125 2023-11-27 06:43:06,157 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 1100, loss[loss=0.061, simple_loss=0.07778, pruned_loss=0.01397, audio_tagging_loss=0.00814, over 16253.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.08888, pruned_loss=0.01211, audio_tagging_loss=0.008599, over 3036990.83 frames. ], batch size: 62, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:43:08,305 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 06:43:18,468 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.493e+01 8.812e+01 9.589e+01 1.029e+02 1.833e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-27 06:43:33,091 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 566250 2023-11-27 06:43:40,533 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3775026.6666666665, ans=0.125 2023-11-27 06:43:48,297 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.91 vs. limit=22.5 2023-11-27 06:44:02,254 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 1150, loss[loss=0.08045, simple_loss=0.1098, pruned_loss=0.01791, audio_tagging_loss=0.00766, over 14892.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.09008, pruned_loss=0.01217, audio_tagging_loss=0.008489, over 3040242.40 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:44:06,292 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3775160.0, ans=0.1 2023-11-27 06:44:28,493 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 566300 2023-11-27 06:44:53,303 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.27 vs. limit=10.0 2023-11-27 06:44:57,789 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 1200, loss[loss=0.05882, simple_loss=0.08474, pruned_loss=0.007707, audio_tagging_loss=0.008745, over 15591.00 frames. ], tot_loss[loss=0.06488, simple_loss=0.0887, pruned_loss=0.01194, audio_tagging_loss=0.008583, over 3037154.07 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 06:45:00,301 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3775493.3333333335, ans=0.125 2023-11-27 06:45:03,575 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.51 vs. limit=12.0 2023-11-27 06:45:09,561 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.024e+01 8.861e+01 9.720e+01 1.059e+02 1.366e+02, threshold=1.944e+02, percent-clipped=0.0 2023-11-27 06:45:18,421 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3775560.0, ans=0.1 2023-11-27 06:45:24,958 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 566350 2023-11-27 06:45:35,180 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3775693.3333333335, ans=0.125 2023-11-27 06:45:36,578 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.51 vs. limit=15.0 2023-11-27 06:45:37,343 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3775693.3333333335, ans=0.2 2023-11-27 06:45:39,649 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3775693.3333333335, ans=0.125 2023-11-27 06:45:44,015 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.87 vs. limit=6.0 2023-11-27 06:45:46,036 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3775760.0, ans=0.125 2023-11-27 06:45:53,280 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 1250, loss[loss=0.06252, simple_loss=0.08308, pruned_loss=0.01163, audio_tagging_loss=0.00934, over 14823.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08942, pruned_loss=0.01215, audio_tagging_loss=0.00853, over 3037798.87 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 06:46:17,961 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3775960.0, ans=0.1 2023-11-27 06:46:21,015 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 566400 2023-11-27 06:46:26,854 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3776026.6666666665, ans=0.125 2023-11-27 06:46:32,455 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.55 vs. limit=6.0 2023-11-27 06:46:41,065 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3776093.3333333335, ans=0.0 2023-11-27 06:46:42,952 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.20 vs. limit=22.5 2023-11-27 06:46:44,902 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3776093.3333333335, ans=0.0 2023-11-27 06:46:50,507 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 1300, loss[loss=0.06927, simple_loss=0.1017, pruned_loss=0.009819, audio_tagging_loss=0.008575, over 16118.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08972, pruned_loss=0.01215, audio_tagging_loss=0.008504, over 3035580.06 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 06:46:53,927 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3776160.0, ans=0.1 2023-11-27 06:46:54,224 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.20 vs. limit=22.5 2023-11-27 06:47:02,754 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.604e+01 8.746e+01 9.407e+01 9.901e+01 1.217e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-27 06:47:09,470 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3776226.6666666665, ans=0.0 2023-11-27 06:47:13,588 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3776293.3333333335, ans=0.07 2023-11-27 06:47:16,613 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 566450 2023-11-27 06:47:20,393 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.73 vs. limit=22.5 2023-11-27 06:47:33,192 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3776360.0, ans=0.5 2023-11-27 06:47:46,223 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 1350, loss[loss=0.07476, simple_loss=0.1142, pruned_loss=0.01181, audio_tagging_loss=0.005833, over 15514.00 frames. ], tot_loss[loss=0.065, simple_loss=0.08866, pruned_loss=0.01212, audio_tagging_loss=0.008544, over 3043241.50 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:47:48,475 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3776493.3333333335, ans=0.125 2023-11-27 06:47:59,167 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3776560.0, ans=0.0 2023-11-27 06:48:05,592 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.85 vs. limit=22.5 2023-11-27 06:48:12,398 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.24 vs. limit=15.0 2023-11-27 06:48:12,906 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 566500 2023-11-27 06:48:14,670 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3776626.6666666665, ans=0.09899494936611666 2023-11-27 06:48:26,722 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 06:48:33,380 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3776760.0, ans=0.0 2023-11-27 06:48:39,714 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3776760.0, ans=0.1 2023-11-27 06:48:39,757 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 06:48:39,828 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3776760.0, ans=0.2 2023-11-27 06:48:41,663 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 1400, loss[loss=0.06539, simple_loss=0.0853, pruned_loss=0.01072, audio_tagging_loss=0.01202, over 16731.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08978, pruned_loss=0.01222, audio_tagging_loss=0.008564, over 3046463.41 frames. ], batch size: 62, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:48:47,717 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3776826.6666666665, ans=0.2 2023-11-27 06:48:55,603 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.486e+01 8.911e+01 9.491e+01 1.013e+02 1.381e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-27 06:48:57,405 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3776893.3333333335, ans=0.035 2023-11-27 06:49:01,714 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3776893.3333333335, ans=0.1 2023-11-27 06:49:09,730 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 566550 2023-11-27 06:49:17,411 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3777026.6666666665, ans=0.1 2023-11-27 06:49:21,671 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3777026.6666666665, ans=0.0 2023-11-27 06:49:26,214 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.23 vs. limit=6.0 2023-11-27 06:49:38,317 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 1450, loss[loss=0.07489, simple_loss=0.1069, pruned_loss=0.01349, audio_tagging_loss=0.007938, over 15612.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.09013, pruned_loss=0.0122, audio_tagging_loss=0.008641, over 3040652.30 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:49:55,651 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3777226.6666666665, ans=0.2 2023-11-27 06:50:03,096 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3777293.3333333335, ans=0.125 2023-11-27 06:50:05,190 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 566600 2023-11-27 06:50:19,847 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3777360.0, ans=0.125 2023-11-27 06:50:21,546 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3777360.0, ans=0.125 2023-11-27 06:50:34,481 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 1500, loss[loss=0.08397, simple_loss=0.1145, pruned_loss=0.01946, audio_tagging_loss=0.007263, over 14867.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08935, pruned_loss=0.01209, audio_tagging_loss=0.008729, over 3038912.62 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:50:34,738 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3777493.3333333335, ans=0.1 2023-11-27 06:50:47,149 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.689e+01 9.111e+01 9.628e+01 1.038e+02 1.478e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-27 06:50:49,622 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3777560.0, ans=0.125 2023-11-27 06:50:53,112 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.69 vs. limit=15.0 2023-11-27 06:50:58,542 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3777626.6666666665, ans=0.125 2023-11-27 06:51:00,525 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 566650 2023-11-27 06:51:16,142 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3777693.3333333335, ans=0.0 2023-11-27 06:51:19,835 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.12 vs. limit=15.0 2023-11-27 06:51:20,444 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3777760.0, ans=0.0 2023-11-27 06:51:22,632 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3777760.0, ans=0.1 2023-11-27 06:51:23,673 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3777760.0, ans=0.0 2023-11-27 06:51:29,878 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 1550, loss[loss=0.05857, simple_loss=0.07935, pruned_loss=0.01015, audio_tagging_loss=0.008746, over 14930.00 frames. ], tot_loss[loss=0.065, simple_loss=0.08865, pruned_loss=0.01195, audio_tagging_loss=0.00873, over 3042084.12 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:51:31,126 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3777826.6666666665, ans=0.2 2023-11-27 06:51:41,424 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3777893.3333333335, ans=0.125 2023-11-27 06:51:57,621 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 566700 2023-11-27 06:52:03,075 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3778026.6666666665, ans=0.0 2023-11-27 06:52:10,743 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=3778026.6666666665, ans=0.1 2023-11-27 06:52:12,023 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3778026.6666666665, ans=0.1 2023-11-27 06:52:17,717 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.97 vs. limit=15.0 2023-11-27 06:52:25,393 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3778160.0, ans=0.5 2023-11-27 06:52:26,230 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 1600, loss[loss=0.05584, simple_loss=0.08079, pruned_loss=0.005438, audio_tagging_loss=0.01001, over 16245.00 frames. ], tot_loss[loss=0.06522, simple_loss=0.08882, pruned_loss=0.01191, audio_tagging_loss=0.008891, over 3047344.47 frames. ], batch size: 61, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 06:52:35,358 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.23 vs. limit=15.0 2023-11-27 06:52:40,788 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.447e+01 9.269e+01 9.916e+01 1.064e+02 1.389e+02, threshold=1.983e+02, percent-clipped=0.0 2023-11-27 06:52:49,557 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3778293.3333333335, ans=0.125 2023-11-27 06:52:53,731 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 566750 2023-11-27 06:53:03,068 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.10 vs. limit=10.0 2023-11-27 06:53:18,566 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.47 vs. limit=15.0 2023-11-27 06:53:23,733 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 1650, loss[loss=0.05901, simple_loss=0.07505, pruned_loss=0.01095, audio_tagging_loss=0.01054, over 15840.00 frames. ], tot_loss[loss=0.06507, simple_loss=0.08853, pruned_loss=0.01186, audio_tagging_loss=0.008944, over 3056408.45 frames. ], batch size: 60, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:53:28,238 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3778493.3333333335, ans=0.125 2023-11-27 06:53:35,775 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3778560.0, ans=0.2 2023-11-27 06:53:50,080 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 566800 2023-11-27 06:54:09,144 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3778760.0, ans=0.0 2023-11-27 06:54:19,599 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 1700, loss[loss=0.0686, simple_loss=0.09524, pruned_loss=0.01232, audio_tagging_loss=0.008655, over 15497.00 frames. ], tot_loss[loss=0.06487, simple_loss=0.08802, pruned_loss=0.01185, audio_tagging_loss=0.009017, over 3056550.27 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:54:20,281 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.02 vs. limit=15.0 2023-11-27 06:54:22,016 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 06:54:31,247 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3778893.3333333335, ans=0.1 2023-11-27 06:54:34,165 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.403e+01 8.972e+01 9.443e+01 1.035e+02 1.327e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-27 06:54:42,464 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3778960.0, ans=0.125 2023-11-27 06:54:44,015 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.64 vs. limit=12.0 2023-11-27 06:54:47,281 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 566850 2023-11-27 06:54:49,543 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3778960.0, ans=0.0 2023-11-27 06:54:58,176 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3779026.6666666665, ans=0.0 2023-11-27 06:55:03,330 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3779093.3333333335, ans=0.2 2023-11-27 06:55:03,379 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3779093.3333333335, ans=0.125 2023-11-27 06:55:09,764 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3779093.3333333335, ans=0.125 2023-11-27 06:55:15,408 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 1750, loss[loss=0.08125, simple_loss=0.1252, pruned_loss=0.01218, audio_tagging_loss=0.006466, over 15154.00 frames. ], tot_loss[loss=0.06454, simple_loss=0.08771, pruned_loss=0.01179, audio_tagging_loss=0.008899, over 3051819.73 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:55:21,555 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3779160.0, ans=0.125 2023-11-27 06:55:23,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3779160.0, ans=0.2 2023-11-27 06:55:34,761 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.61 vs. limit=15.0 2023-11-27 06:55:37,346 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3779293.3333333335, ans=0.0 2023-11-27 06:55:42,598 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 566900 2023-11-27 06:55:47,013 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3779293.3333333335, ans=0.0 2023-11-27 06:56:12,158 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 1800, loss[loss=0.06494, simple_loss=0.0883, pruned_loss=0.01349, audio_tagging_loss=0.007289, over 14529.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.08883, pruned_loss=0.01198, audio_tagging_loss=0.008798, over 3049352.11 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:56:16,135 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3779493.3333333335, ans=0.1 2023-11-27 06:56:16,591 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.73 vs. limit=6.0 2023-11-27 06:56:26,449 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.562e+01 9.045e+01 9.662e+01 1.034e+02 1.361e+02, threshold=1.932e+02, percent-clipped=0.0 2023-11-27 06:56:26,706 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3779560.0, ans=0.0 2023-11-27 06:56:38,860 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 566950 2023-11-27 06:56:54,914 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3779693.3333333335, ans=0.125 2023-11-27 06:57:07,975 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 1850, loss[loss=0.07366, simple_loss=0.09963, pruned_loss=0.0143, audio_tagging_loss=0.00954, over 14927.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08908, pruned_loss=0.01205, audio_tagging_loss=0.008713, over 3053331.52 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:57:08,465 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.96 vs. limit=15.0 2023-11-27 06:57:31,532 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3779960.0, ans=0.0 2023-11-27 06:57:34,706 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 567000 2023-11-27 06:57:56,508 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3780093.3333333335, ans=0.125 2023-11-27 06:58:04,549 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 1900, loss[loss=0.05231, simple_loss=0.07104, pruned_loss=0.008653, audio_tagging_loss=0.008136, over 14726.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08912, pruned_loss=0.01202, audio_tagging_loss=0.008669, over 3050903.55 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:58:19,413 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.670e+01 9.174e+01 9.812e+01 1.054e+02 1.527e+02, threshold=1.962e+02, percent-clipped=0.0 2023-11-27 06:58:31,861 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 567050 2023-11-27 06:58:51,023 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.17 vs. limit=6.0 2023-11-27 06:58:58,646 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3780426.6666666665, ans=0.0 2023-11-27 06:59:00,542 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 1950, loss[loss=0.03431, simple_loss=0.04629, pruned_loss=0.005396, audio_tagging_loss=0.005772, over 15411.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08928, pruned_loss=0.01205, audio_tagging_loss=0.008573, over 3052428.38 frames. ], batch size: 61, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:59:03,353 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3780493.3333333335, ans=0.0 2023-11-27 06:59:05,669 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3780493.3333333335, ans=0.2 2023-11-27 06:59:15,788 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3780560.0, ans=0.2 2023-11-27 06:59:27,381 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 567100 2023-11-27 06:59:48,861 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.61 vs. limit=6.0 2023-11-27 06:59:57,510 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 2000, loss[loss=0.07346, simple_loss=0.1099, pruned_loss=0.01178, audio_tagging_loss=0.006741, over 16460.00 frames. ], tot_loss[loss=0.06504, simple_loss=0.08892, pruned_loss=0.012, audio_tagging_loss=0.008573, over 3044748.60 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 07:00:11,839 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.474e+01 9.102e+01 9.766e+01 1.042e+02 1.467e+02, threshold=1.953e+02, percent-clipped=0.0 2023-11-27 07:00:15,470 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3780893.3333333335, ans=0.0 2023-11-27 07:00:24,169 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 567150 2023-11-27 07:00:41,479 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3781093.3333333335, ans=0.125 2023-11-27 07:00:42,433 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3781093.3333333335, ans=0.0 2023-11-27 07:00:52,954 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 2050, loss[loss=0.06664, simple_loss=0.09399, pruned_loss=0.0125, audio_tagging_loss=0.007142, over 15146.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08929, pruned_loss=0.01212, audio_tagging_loss=0.008534, over 3042334.64 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 07:01:06,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3781226.6666666665, ans=0.125 2023-11-27 07:01:20,008 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 567200 2023-11-27 07:01:23,093 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3781293.3333333335, ans=0.1 2023-11-27 07:01:49,454 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 2100, loss[loss=0.07972, simple_loss=0.1059, pruned_loss=0.0193, audio_tagging_loss=0.007491, over 15523.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.08937, pruned_loss=0.01203, audio_tagging_loss=0.008459, over 3037958.36 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:01:49,772 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3781493.3333333335, ans=0.1 2023-11-27 07:01:55,978 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.08 vs. limit=15.0 2023-11-27 07:01:58,710 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=3781493.3333333335, ans=10.0 2023-11-27 07:01:59,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3781560.0, ans=0.0 2023-11-27 07:02:04,945 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.553e+01 8.912e+01 9.473e+01 1.026e+02 1.468e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-27 07:02:16,286 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 567250 2023-11-27 07:02:17,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3781626.6666666665, ans=0.125 2023-11-27 07:02:32,411 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3781693.3333333335, ans=0.0 2023-11-27 07:02:43,584 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3781760.0, ans=0.1 2023-11-27 07:02:45,487 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 2150, loss[loss=0.09661, simple_loss=0.1354, pruned_loss=0.02202, audio_tagging_loss=0.006881, over 15659.00 frames. ], tot_loss[loss=0.06482, simple_loss=0.08911, pruned_loss=0.01183, audio_tagging_loss=0.008444, over 3037198.83 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:02:51,844 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.52 vs. limit=15.0 2023-11-27 07:03:12,869 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 567300 2023-11-27 07:03:19,656 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 07:03:22,950 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3782026.6666666665, ans=0.125 2023-11-27 07:03:28,443 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.75 vs. limit=15.0 2023-11-27 07:03:41,264 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 2200, loss[loss=0.06441, simple_loss=0.09413, pruned_loss=0.01024, audio_tagging_loss=0.0071, over 15511.00 frames. ], tot_loss[loss=0.06509, simple_loss=0.0894, pruned_loss=0.01195, audio_tagging_loss=0.008445, over 3041730.43 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:03:45,286 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3782160.0, ans=0.1 2023-11-27 07:03:49,557 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3782160.0, ans=0.125 2023-11-27 07:03:57,267 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.562e+01 8.960e+01 9.671e+01 1.061e+02 1.263e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-27 07:04:08,503 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 567350 2023-11-27 07:04:10,011 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.17 vs. limit=6.0 2023-11-27 07:04:16,153 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3782360.0, ans=0.125 2023-11-27 07:04:37,763 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 2250, loss[loss=0.06467, simple_loss=0.08775, pruned_loss=0.01085, audio_tagging_loss=0.009945, over 14514.00 frames. ], tot_loss[loss=0.06531, simple_loss=0.08975, pruned_loss=0.01198, audio_tagging_loss=0.008455, over 3045442.64 frames. ], batch size: 60, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:04:45,924 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3782493.3333333335, ans=0.0 2023-11-27 07:05:03,633 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3782626.6666666665, ans=0.0 2023-11-27 07:05:04,596 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 567400 2023-11-27 07:05:27,977 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3782760.0, ans=0.125 2023-11-27 07:05:34,183 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 2300, loss[loss=0.07102, simple_loss=0.08954, pruned_loss=0.01553, audio_tagging_loss=0.01072, over 14818.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.0902, pruned_loss=0.01197, audio_tagging_loss=0.008382, over 3042331.32 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:05:37,678 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3782826.6666666665, ans=0.125 2023-11-27 07:05:41,763 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3782826.6666666665, ans=0.125 2023-11-27 07:05:49,430 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.463e+01 8.875e+01 9.360e+01 1.027e+02 1.398e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-27 07:06:01,222 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 567450 2023-11-27 07:06:07,795 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3783026.6666666665, ans=0.125 2023-11-27 07:06:16,806 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3783026.6666666665, ans=0.1 2023-11-27 07:06:23,031 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 07:06:29,258 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 2350, loss[loss=0.07867, simple_loss=0.115, pruned_loss=0.01371, audio_tagging_loss=0.007467, over 15412.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.0896, pruned_loss=0.01195, audio_tagging_loss=0.008422, over 3048738.28 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:06:36,566 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.43 vs. limit=15.0 2023-11-27 07:06:47,345 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3783226.6666666665, ans=0.2 2023-11-27 07:06:57,335 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 567500 2023-11-27 07:06:57,575 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3783293.3333333335, ans=0.5 2023-11-27 07:07:03,725 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3783360.0, ans=0.125 2023-11-27 07:07:26,544 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 2400, loss[loss=0.05575, simple_loss=0.06904, pruned_loss=0.009131, audio_tagging_loss=0.0121, over 15916.00 frames. ], tot_loss[loss=0.06484, simple_loss=0.08893, pruned_loss=0.01182, audio_tagging_loss=0.008549, over 3048453.06 frames. ], batch size: 62, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 07:07:33,757 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.76 vs. limit=15.0 2023-11-27 07:07:39,400 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.23 vs. limit=22.5 2023-11-27 07:07:42,060 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.653e+01 8.975e+01 9.647e+01 1.056e+02 1.487e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-27 07:07:47,947 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.15 vs. limit=15.0 2023-11-27 07:07:52,931 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 567550 2023-11-27 07:08:00,907 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3783693.3333333335, ans=0.125 2023-11-27 07:08:22,604 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 2450, loss[loss=0.07032, simple_loss=0.09742, pruned_loss=0.01054, audio_tagging_loss=0.01107, over 16897.00 frames. ], tot_loss[loss=0.06499, simple_loss=0.08907, pruned_loss=0.01179, audio_tagging_loss=0.008659, over 3057270.05 frames. ], batch size: 63, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:08:49,952 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 567600 2023-11-27 07:09:18,675 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 2500, loss[loss=0.06113, simple_loss=0.07824, pruned_loss=0.01064, audio_tagging_loss=0.01137, over 15577.00 frames. ], tot_loss[loss=0.06503, simple_loss=0.08909, pruned_loss=0.01172, audio_tagging_loss=0.008762, over 3057939.58 frames. ], batch size: 61, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:09:37,480 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.405e+01 8.971e+01 9.601e+01 1.047e+02 1.603e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-27 07:09:41,941 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3784293.3333333335, ans=0.2 2023-11-27 07:09:46,613 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 567650 2023-11-27 07:09:50,989 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3784293.3333333335, ans=0.125 2023-11-27 07:10:01,735 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3784360.0, ans=0.1 2023-11-27 07:10:05,957 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3784426.6666666665, ans=0.0 2023-11-27 07:10:16,040 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 2550, loss[loss=0.06382, simple_loss=0.09608, pruned_loss=0.006548, audio_tagging_loss=0.009232, over 16197.00 frames. ], tot_loss[loss=0.0647, simple_loss=0.08843, pruned_loss=0.01171, audio_tagging_loss=0.008775, over 3050881.65 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 07:10:28,157 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.46 vs. limit=12.0 2023-11-27 07:10:29,041 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.99 vs. limit=15.0 2023-11-27 07:10:38,269 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3784626.6666666665, ans=0.1 2023-11-27 07:10:42,456 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 567700 2023-11-27 07:10:42,858 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.66 vs. limit=15.0 2023-11-27 07:10:59,187 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3784693.3333333335, ans=0.125 2023-11-27 07:11:09,445 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3784760.0, ans=0.0 2023-11-27 07:11:10,310 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 07:11:11,540 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3784826.6666666665, ans=0.0 2023-11-27 07:11:12,339 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 2600, loss[loss=0.0566, simple_loss=0.08465, pruned_loss=0.008037, audio_tagging_loss=0.006236, over 13930.00 frames. ], tot_loss[loss=0.06439, simple_loss=0.08852, pruned_loss=0.01158, audio_tagging_loss=0.008554, over 3048338.25 frames. ], batch size: 53, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 07:11:29,533 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.178e+01 8.922e+01 9.501e+01 1.026e+02 1.288e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-27 07:11:38,609 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 567750 2023-11-27 07:12:06,047 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3785093.3333333335, ans=0.1 2023-11-27 07:12:07,910 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 2650, loss[loss=0.0844, simple_loss=0.1145, pruned_loss=0.01512, audio_tagging_loss=0.01205, over 15997.00 frames. ], tot_loss[loss=0.06455, simple_loss=0.08866, pruned_loss=0.0116, audio_tagging_loss=0.008613, over 3046213.29 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 07:12:14,498 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3785160.0, ans=0.0 2023-11-27 07:12:20,383 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3785226.6666666665, ans=0.125 2023-11-27 07:12:35,620 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 567800 2023-11-27 07:12:35,781 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3785293.3333333335, ans=0.125 2023-11-27 07:12:53,055 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.75 vs. limit=15.0 2023-11-27 07:12:55,275 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.18 vs. limit=6.0 2023-11-27 07:13:00,826 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3785426.6666666665, ans=0.125 2023-11-27 07:13:03,847 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 2700, loss[loss=0.04751, simple_loss=0.06537, pruned_loss=0.006782, audio_tagging_loss=0.008041, over 13788.00 frames. ], tot_loss[loss=0.06434, simple_loss=0.0885, pruned_loss=0.01158, audio_tagging_loss=0.00851, over 3054701.59 frames. ], batch size: 53, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 07:13:07,251 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.92 vs. limit=15.0 2023-11-27 07:13:18,702 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=3785560.0, ans=0.5 2023-11-27 07:13:22,682 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.681e+01 9.127e+01 9.631e+01 1.043e+02 1.339e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-27 07:13:23,163 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.11 vs. limit=6.0 2023-11-27 07:13:24,014 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3785560.0, ans=10.0 2023-11-27 07:13:31,260 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 567850 2023-11-27 07:13:39,862 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3785693.3333333335, ans=0.125 2023-11-27 07:13:53,181 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3785760.0, ans=0.0 2023-11-27 07:14:00,771 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 2750, loss[loss=0.06705, simple_loss=0.09594, pruned_loss=0.00881, audio_tagging_loss=0.01027, over 14398.00 frames. ], tot_loss[loss=0.06401, simple_loss=0.08795, pruned_loss=0.01156, audio_tagging_loss=0.008468, over 3049800.51 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 07:14:16,859 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3785893.3333333335, ans=0.07 2023-11-27 07:14:26,688 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 567900 2023-11-27 07:14:33,891 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3786026.6666666665, ans=0.1 2023-11-27 07:14:48,514 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 07:14:49,102 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.73 vs. limit=15.0 2023-11-27 07:14:56,124 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 2800, loss[loss=0.05454, simple_loss=0.07613, pruned_loss=0.007655, audio_tagging_loss=0.008824, over 15026.00 frames. ], tot_loss[loss=0.06468, simple_loss=0.08896, pruned_loss=0.01178, audio_tagging_loss=0.008425, over 3049607.48 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:15:07,205 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.80 vs. limit=22.5 2023-11-27 07:15:12,202 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.79 vs. limit=15.0 2023-11-27 07:15:12,828 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3786226.6666666665, ans=0.2 2023-11-27 07:15:12,989 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3786226.6666666665, ans=0.125 2023-11-27 07:15:14,278 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.005e+01 9.123e+01 9.680e+01 1.057e+02 2.633e+02, threshold=1.936e+02, percent-clipped=1.0 2023-11-27 07:15:20,573 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3786293.3333333335, ans=0.2 2023-11-27 07:15:23,603 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 567950 2023-11-27 07:15:52,500 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 2850, loss[loss=0.05539, simple_loss=0.07301, pruned_loss=0.01103, audio_tagging_loss=0.007856, over 14027.00 frames. ], tot_loss[loss=0.06474, simple_loss=0.08886, pruned_loss=0.01184, audio_tagging_loss=0.008461, over 3045610.82 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:16:00,753 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3786493.3333333335, ans=0.1 2023-11-27 07:16:19,133 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 568000 2023-11-27 07:16:20,397 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-568000.pt 2023-11-27 07:16:50,536 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 2900, loss[loss=0.07018, simple_loss=0.1047, pruned_loss=0.008978, audio_tagging_loss=0.008838, over 14472.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.09037, pruned_loss=0.01223, audio_tagging_loss=0.008467, over 3042696.46 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:17:07,510 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.788e+01 8.964e+01 9.454e+01 1.013e+02 1.177e+02, threshold=1.891e+02, percent-clipped=0.0 2023-11-27 07:17:07,828 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3786893.3333333335, ans=0.2 2023-11-27 07:17:08,927 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.06 vs. limit=12.0 2023-11-27 07:17:16,006 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 568050 2023-11-27 07:17:45,191 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 2950, loss[loss=0.06421, simple_loss=0.08373, pruned_loss=0.01481, audio_tagging_loss=0.007535, over 15361.00 frames. ], tot_loss[loss=0.06498, simple_loss=0.08905, pruned_loss=0.01195, audio_tagging_loss=0.008509, over 3047501.29 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:18:08,109 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.13 vs. limit=15.0 2023-11-27 07:18:11,721 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 568100 2023-11-27 07:18:20,520 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3787360.0, ans=0.125 2023-11-27 07:18:36,476 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3787426.6666666665, ans=0.1 2023-11-27 07:18:39,066 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.61 vs. limit=6.0 2023-11-27 07:18:40,619 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 3000, loss[loss=0.0744, simple_loss=0.1024, pruned_loss=0.01652, audio_tagging_loss=0.006673, over 16015.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08976, pruned_loss=0.01212, audio_tagging_loss=0.008472, over 3048422.92 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 07:18:40,621 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-27 07:18:56,477 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([3.9451, 3.0992, 2.7806, 2.9621, 3.2903, 2.6194, 3.3659, 2.4539], device='cuda:0') 2023-11-27 07:19:01,224 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.1573, 2.3865, 5.0462, 3.0073], device='cuda:0') 2023-11-27 07:19:13,018 INFO [train_asr.py:1267] (0/4) Epoch 48, validation: loss=0.05781, simple_loss=0.05047, pruned_loss=0.005352, audio_tagging_loss=0.02722, over 4681554.00 frames. 2023-11-27 07:19:13,019 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-27 07:19:31,101 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.941e+01 8.979e+01 9.616e+01 1.040e+02 1.231e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-27 07:19:39,354 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 568150 2023-11-27 07:20:08,445 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 3050, loss[loss=0.06338, simple_loss=0.09044, pruned_loss=0.0115, audio_tagging_loss=0.006654, over 14352.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.09052, pruned_loss=0.01225, audio_tagging_loss=0.008577, over 3045854.89 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 07:20:26,604 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3787893.3333333335, ans=0.125 2023-11-27 07:20:30,117 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.44 vs. limit=15.0 2023-11-27 07:20:35,630 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 568200 2023-11-27 07:20:35,759 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3787960.0, ans=0.125 2023-11-27 07:20:41,573 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 07:20:52,385 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3788093.3333333335, ans=0.025 2023-11-27 07:20:59,951 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3788093.3333333335, ans=0.95 2023-11-27 07:21:03,306 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.64 vs. limit=22.5 2023-11-27 07:21:04,510 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 3100, loss[loss=0.06449, simple_loss=0.07808, pruned_loss=0.01215, audio_tagging_loss=0.0133, over 15421.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.09061, pruned_loss=0.01217, audio_tagging_loss=0.008601, over 3043874.87 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 07:21:04,663 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3788160.0, ans=0.125 2023-11-27 07:21:21,621 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3788226.6666666665, ans=0.0 2023-11-27 07:21:24,096 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.788e+01 9.299e+01 9.781e+01 1.036e+02 1.255e+02, threshold=1.956e+02, percent-clipped=0.0 2023-11-27 07:21:31,533 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 568250 2023-11-27 07:21:32,760 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3788293.3333333335, ans=0.0 2023-11-27 07:21:37,190 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3788360.0, ans=0.1 2023-11-27 07:21:50,618 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.11 vs. limit=15.0 2023-11-27 07:21:55,200 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3788426.6666666665, ans=0.1 2023-11-27 07:22:00,752 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 3150, loss[loss=0.06915, simple_loss=0.103, pruned_loss=0.01079, audio_tagging_loss=0.006878, over 17226.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.09138, pruned_loss=0.01222, audio_tagging_loss=0.008633, over 3036227.69 frames. ], batch size: 63, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 07:22:04,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3788493.3333333335, ans=0.125 2023-11-27 07:22:17,416 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3788560.0, ans=0.125 2023-11-27 07:22:26,898 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 568300 2023-11-27 07:22:29,786 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3788626.6666666665, ans=0.125 2023-11-27 07:22:32,193 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.40 vs. limit=10.0 2023-11-27 07:22:52,668 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3788760.0, ans=0.125 2023-11-27 07:22:56,753 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 3200, loss[loss=0.07413, simple_loss=0.1085, pruned_loss=0.01449, audio_tagging_loss=0.005378, over 16784.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.09089, pruned_loss=0.01208, audio_tagging_loss=0.008705, over 3043585.62 frames. ], batch size: 61, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:23:06,644 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3788893.3333333335, ans=0.125 2023-11-27 07:23:07,976 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.70 vs. limit=15.0 2023-11-27 07:23:15,178 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.870e+01 9.115e+01 9.612e+01 1.036e+02 1.415e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-27 07:23:23,211 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 568350 2023-11-27 07:23:23,346 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3788960.0, ans=0.2 2023-11-27 07:23:23,380 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3788960.0, ans=0.125 2023-11-27 07:23:29,275 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3789026.6666666665, ans=0.125 2023-11-27 07:23:37,174 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3789026.6666666665, ans=0.1 2023-11-27 07:23:46,613 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=3789093.3333333335, ans=0.025 2023-11-27 07:23:51,839 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 3250, loss[loss=0.07337, simple_loss=0.0964, pruned_loss=0.01645, audio_tagging_loss=0.008721, over 15782.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.09084, pruned_loss=0.01215, audio_tagging_loss=0.008766, over 3042663.34 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:24:12,978 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3789226.6666666665, ans=0.0 2023-11-27 07:24:19,674 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 568400 2023-11-27 07:24:27,562 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3789360.0, ans=0.0 2023-11-27 07:24:48,737 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 3300, loss[loss=0.07049, simple_loss=0.1026, pruned_loss=0.008732, audio_tagging_loss=0.01048, over 15765.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.09087, pruned_loss=0.01221, audio_tagging_loss=0.00883, over 3038040.48 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:24:49,050 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 07:24:59,494 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.12 vs. limit=22.5 2023-11-27 07:25:07,189 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3789560.0, ans=0.2 2023-11-27 07:25:07,998 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.162e+01 9.209e+01 1.006e+02 1.091e+02 1.432e+02, threshold=2.012e+02, percent-clipped=0.0 2023-11-27 07:25:15,512 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 568450 2023-11-27 07:25:20,492 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3789626.6666666665, ans=0.0 2023-11-27 07:25:23,672 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3789693.3333333335, ans=0.1 2023-11-27 07:25:33,025 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.63 vs. limit=12.0 2023-11-27 07:25:42,312 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.42 vs. limit=22.5 2023-11-27 07:25:44,966 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 3350, loss[loss=0.06987, simple_loss=0.1019, pruned_loss=0.01378, audio_tagging_loss=0.005155, over 14728.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.09098, pruned_loss=0.01233, audio_tagging_loss=0.008776, over 3043534.68 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:25:54,284 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3789826.6666666665, ans=0.0 2023-11-27 07:26:12,267 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 568500 2023-11-27 07:26:24,557 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3790026.6666666665, ans=0.125 2023-11-27 07:26:30,062 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3790093.3333333335, ans=0.2 2023-11-27 07:26:37,951 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 07:26:40,123 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3790160.0, ans=0.125 2023-11-27 07:26:40,917 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 3400, loss[loss=0.05292, simple_loss=0.06925, pruned_loss=0.00902, audio_tagging_loss=0.009268, over 14807.00 frames. ], tot_loss[loss=0.06674, simple_loss=0.09131, pruned_loss=0.01243, audio_tagging_loss=0.008651, over 3045588.32 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:27:00,696 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.647e+01 9.061e+01 9.680e+01 1.035e+02 1.293e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-27 07:27:03,174 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3790293.3333333335, ans=0.0 2023-11-27 07:27:08,318 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 568550 2023-11-27 07:27:11,109 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.17 vs. limit=15.0 2023-11-27 07:27:12,779 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3790293.3333333335, ans=0.0 2023-11-27 07:27:31,956 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3790426.6666666665, ans=0.1 2023-11-27 07:27:34,443 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3790426.6666666665, ans=0.125 2023-11-27 07:27:37,469 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 3450, loss[loss=0.08589, simple_loss=0.1228, pruned_loss=0.0172, audio_tagging_loss=0.007309, over 16326.00 frames. ], tot_loss[loss=0.06691, simple_loss=0.09191, pruned_loss=0.01248, audio_tagging_loss=0.008474, over 3046849.77 frames. ], batch size: 60, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:27:38,847 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3790493.3333333335, ans=0.125 2023-11-27 07:28:01,081 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3790626.6666666665, ans=0.2 2023-11-27 07:28:04,056 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 568600 2023-11-27 07:28:05,574 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3790626.6666666665, ans=0.125 2023-11-27 07:28:06,645 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3790626.6666666665, ans=0.1 2023-11-27 07:28:33,474 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 3500, loss[loss=0.06411, simple_loss=0.0841, pruned_loss=0.01188, audio_tagging_loss=0.01018, over 15839.00 frames. ], tot_loss[loss=0.066, simple_loss=0.09065, pruned_loss=0.01224, audio_tagging_loss=0.008428, over 3053067.22 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:28:34,683 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3790826.6666666665, ans=0.2 2023-11-27 07:28:51,063 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3790893.3333333335, ans=0.0 2023-11-27 07:28:52,423 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.089e+01 9.044e+01 9.629e+01 1.032e+02 1.263e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-27 07:28:54,031 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.70 vs. limit=22.5 2023-11-27 07:29:00,497 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 568650 2023-11-27 07:29:02,535 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 07:29:03,810 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3790960.0, ans=0.125 2023-11-27 07:29:07,821 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.48 vs. limit=6.0 2023-11-27 07:29:22,436 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3791093.3333333335, ans=0.0 2023-11-27 07:29:29,117 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 3550, loss[loss=0.07994, simple_loss=0.1148, pruned_loss=0.01687, audio_tagging_loss=0.00564, over 14407.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.08968, pruned_loss=0.01212, audio_tagging_loss=0.008438, over 3051700.07 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:29:31,398 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3791160.0, ans=0.125 2023-11-27 07:29:35,461 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.93 vs. limit=15.0 2023-11-27 07:29:56,244 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 568700 2023-11-27 07:29:56,904 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.10 vs. limit=15.0 2023-11-27 07:30:21,144 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.20 vs. limit=15.0 2023-11-27 07:30:25,372 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 3600, loss[loss=0.07679, simple_loss=0.1132, pruned_loss=0.01385, audio_tagging_loss=0.006347, over 14695.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.08918, pruned_loss=0.01189, audio_tagging_loss=0.008432, over 3046763.05 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 07:30:28,774 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3791493.3333333335, ans=0.125 2023-11-27 07:30:43,741 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.448e+01 8.813e+01 9.473e+01 1.021e+02 1.241e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-27 07:30:51,177 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 568750 2023-11-27 07:31:20,980 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 3650, loss[loss=0.05406, simple_loss=0.074, pruned_loss=0.01031, audio_tagging_loss=0.006747, over 13259.00 frames. ], tot_loss[loss=0.06448, simple_loss=0.08833, pruned_loss=0.01185, audio_tagging_loss=0.008465, over 3047125.58 frames. ], batch size: 51, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 07:31:24,257 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3791826.6666666665, ans=0.125 2023-11-27 07:31:26,357 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3791826.6666666665, ans=0.0 2023-11-27 07:31:39,260 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.78 vs. limit=22.5 2023-11-27 07:31:47,526 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 568800 2023-11-27 07:31:51,655 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3791960.0, ans=0.125 2023-11-27 07:32:06,055 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3792093.3333333335, ans=0.125 2023-11-27 07:32:12,519 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3792093.3333333335, ans=0.125 2023-11-27 07:32:14,482 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3792093.3333333335, ans=0.2 2023-11-27 07:32:16,508 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 3700, loss[loss=0.06087, simple_loss=0.08177, pruned_loss=0.01243, audio_tagging_loss=0.007549, over 15631.00 frames. ], tot_loss[loss=0.06466, simple_loss=0.08835, pruned_loss=0.01191, audio_tagging_loss=0.008575, over 3047532.67 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:32:21,012 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3792160.0, ans=0.0 2023-11-27 07:32:37,081 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.335e+01 9.068e+01 9.744e+01 1.049e+02 1.191e+02, threshold=1.949e+02, percent-clipped=0.0 2023-11-27 07:32:40,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3792293.3333333335, ans=0.0 2023-11-27 07:32:41,102 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=3792293.3333333335, ans=0.025 2023-11-27 07:32:43,342 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.67 vs. limit=15.0 2023-11-27 07:32:44,051 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 568850 2023-11-27 07:32:50,756 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3792360.0, ans=0.125 2023-11-27 07:33:13,036 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 3750, loss[loss=0.07164, simple_loss=0.09935, pruned_loss=0.01229, audio_tagging_loss=0.009674, over 14651.00 frames. ], tot_loss[loss=0.06484, simple_loss=0.08839, pruned_loss=0.01196, audio_tagging_loss=0.008692, over 3042163.09 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:33:22,927 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3792493.3333333335, ans=0.125 2023-11-27 07:33:30,762 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.42 vs. limit=6.0 2023-11-27 07:33:31,463 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3792560.0, ans=0.125 2023-11-27 07:33:33,392 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3792560.0, ans=0.125 2023-11-27 07:33:39,742 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 568900 2023-11-27 07:33:47,490 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3792693.3333333335, ans=0.125 2023-11-27 07:33:52,681 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 07:33:59,634 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3792760.0, ans=0.125 2023-11-27 07:33:59,746 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3792760.0, ans=0.1 2023-11-27 07:34:09,536 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 3800, loss[loss=0.06993, simple_loss=0.09345, pruned_loss=0.01222, audio_tagging_loss=0.01099, over 15723.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08898, pruned_loss=0.01203, audio_tagging_loss=0.008744, over 3042479.73 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:34:24,586 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3792893.3333333335, ans=0.125 2023-11-27 07:34:28,603 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.900e+01 9.318e+01 9.986e+01 1.074e+02 1.810e+02, threshold=1.997e+02, percent-clipped=0.0 2023-11-27 07:34:35,963 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 568950 2023-11-27 07:34:41,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3793026.6666666665, ans=0.125 2023-11-27 07:34:45,069 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3793026.6666666665, ans=0.125 2023-11-27 07:35:04,403 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 3850, loss[loss=0.06645, simple_loss=0.09533, pruned_loss=0.01162, audio_tagging_loss=0.007164, over 14266.00 frames. ], tot_loss[loss=0.06507, simple_loss=0.08882, pruned_loss=0.01193, audio_tagging_loss=0.008728, over 3042438.79 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:35:06,760 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3793160.0, ans=0.125 2023-11-27 07:35:09,861 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3793160.0, ans=0.2 2023-11-27 07:35:11,362 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.48 vs. limit=6.0 2023-11-27 07:35:15,712 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3793226.6666666665, ans=0.0 2023-11-27 07:35:32,174 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 569000 2023-11-27 07:35:47,445 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3793360.0, ans=0.125 2023-11-27 07:36:00,491 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 3900, loss[loss=0.07503, simple_loss=0.1022, pruned_loss=0.01504, audio_tagging_loss=0.00891, over 15833.00 frames. ], tot_loss[loss=0.06482, simple_loss=0.08842, pruned_loss=0.01181, audio_tagging_loss=0.008803, over 3035849.50 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 07:36:22,247 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.637e+01 9.249e+01 9.619e+01 1.030e+02 1.289e+02, threshold=1.924e+02, percent-clipped=0.0 2023-11-27 07:36:24,042 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.51 vs. limit=6.0 2023-11-27 07:36:27,699 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 569050 2023-11-27 07:36:45,448 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 07:36:53,848 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3793760.0, ans=0.125 2023-11-27 07:36:56,877 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 3950, loss[loss=0.0563, simple_loss=0.06953, pruned_loss=0.007684, audio_tagging_loss=0.01385, over 14557.00 frames. ], tot_loss[loss=0.06483, simple_loss=0.08816, pruned_loss=0.01186, audio_tagging_loss=0.008885, over 3031898.71 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 07:37:07,799 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3793893.3333333335, ans=0.125 2023-11-27 07:37:23,056 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 569100 2023-11-27 07:37:36,396 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3794026.6666666665, ans=0.125 2023-11-27 07:37:36,463 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3794026.6666666665, ans=0.2 2023-11-27 07:37:52,207 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 4000, loss[loss=0.07267, simple_loss=0.1027, pruned_loss=0.01318, audio_tagging_loss=0.008156, over 15125.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08893, pruned_loss=0.01217, audio_tagging_loss=0.008938, over 3035028.86 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:38:13,569 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.003e+01 9.122e+01 1.001e+02 1.075e+02 1.362e+02, threshold=2.001e+02, percent-clipped=0.0 2023-11-27 07:38:20,072 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 569150 2023-11-27 07:38:23,345 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3794293.3333333335, ans=0.125 2023-11-27 07:38:44,670 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3794426.6666666665, ans=0.125 2023-11-27 07:38:48,289 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 4050, loss[loss=0.09365, simple_loss=0.1302, pruned_loss=0.02114, audio_tagging_loss=0.00741, over 15429.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.0895, pruned_loss=0.01218, audio_tagging_loss=0.008908, over 3036026.73 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:38:51,497 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 07:38:54,897 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3794493.3333333335, ans=0.2 2023-11-27 07:39:08,019 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3794560.0, ans=0.125 2023-11-27 07:39:10,267 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3794626.6666666665, ans=0.1 2023-11-27 07:39:15,558 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 569200 2023-11-27 07:39:20,159 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3794626.6666666665, ans=0.125 2023-11-27 07:39:23,250 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3794693.3333333335, ans=10.0 2023-11-27 07:39:23,384 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3794693.3333333335, ans=0.125 2023-11-27 07:39:44,867 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 4100, loss[loss=0.07108, simple_loss=0.09041, pruned_loss=0.0165, audio_tagging_loss=0.00938, over 13906.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.08975, pruned_loss=0.01235, audio_tagging_loss=0.008884, over 3037462.71 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:39:45,034 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3794826.6666666665, ans=0.0 2023-11-27 07:39:58,815 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.13 vs. limit=15.0 2023-11-27 07:40:05,684 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.307e+01 9.057e+01 9.659e+01 1.032e+02 2.111e+02, threshold=1.932e+02, percent-clipped=1.0 2023-11-27 07:40:06,223 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.39 vs. limit=22.5 2023-11-27 07:40:10,131 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3794960.0, ans=0.125 2023-11-27 07:40:11,106 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 569250 2023-11-27 07:40:15,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3794960.0, ans=0.2 2023-11-27 07:40:21,610 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3795026.6666666665, ans=0.95 2023-11-27 07:40:32,363 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3795093.3333333335, ans=0.125 2023-11-27 07:40:34,490 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3795093.3333333335, ans=0.0 2023-11-27 07:40:37,624 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3795093.3333333335, ans=0.0 2023-11-27 07:40:40,542 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 4150, loss[loss=0.05413, simple_loss=0.07636, pruned_loss=0.01083, audio_tagging_loss=0.00512, over 14690.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08939, pruned_loss=0.01233, audio_tagging_loss=0.008742, over 3038565.92 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:40:52,337 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3795226.6666666665, ans=0.0 2023-11-27 07:41:01,610 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3795293.3333333335, ans=0.125 2023-11-27 07:41:05,246 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3795293.3333333335, ans=0.125 2023-11-27 07:41:07,134 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 569300 2023-11-27 07:41:18,101 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3795360.0, ans=0.0 2023-11-27 07:41:22,169 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 07:41:25,693 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3795426.6666666665, ans=0.0 2023-11-27 07:41:28,735 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3795426.6666666665, ans=0.125 2023-11-27 07:41:31,918 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3795426.6666666665, ans=0.2 2023-11-27 07:41:32,387 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.26 vs. limit=15.0 2023-11-27 07:41:36,083 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 4200, loss[loss=0.07439, simple_loss=0.1062, pruned_loss=0.01343, audio_tagging_loss=0.007849, over 15573.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.09017, pruned_loss=0.01231, audio_tagging_loss=0.008654, over 3047218.79 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:41:42,040 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3795493.3333333335, ans=0.0 2023-11-27 07:41:46,848 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff3.min_abs, batch_count=3795560.0, ans=0.2 2023-11-27 07:41:57,648 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3795560.0, ans=0.125 2023-11-27 07:41:57,673 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3795560.0, ans=0.125 2023-11-27 07:41:58,440 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.451e+01 9.145e+01 9.853e+01 1.047e+02 1.662e+02, threshold=1.971e+02, percent-clipped=0.0 2023-11-27 07:42:03,282 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.95 vs. limit=10.0 2023-11-27 07:42:03,836 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 569350 2023-11-27 07:42:07,123 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=3795626.6666666665, ans=0.05 2023-11-27 07:42:07,508 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.73 vs. limit=12.0 2023-11-27 07:42:09,363 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3795693.3333333335, ans=0.0 2023-11-27 07:42:13,573 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3795693.3333333335, ans=0.125 2023-11-27 07:42:21,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3795760.0, ans=0.125 2023-11-27 07:42:22,955 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.83 vs. limit=15.0 2023-11-27 07:42:27,111 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3795760.0, ans=0.125 2023-11-27 07:42:32,525 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 4250, loss[loss=0.05308, simple_loss=0.07159, pruned_loss=0.007434, audio_tagging_loss=0.009845, over 15416.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08943, pruned_loss=0.01214, audio_tagging_loss=0.008572, over 3049735.77 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:42:35,493 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3795826.6666666665, ans=0.125 2023-11-27 07:42:42,711 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.98 vs. limit=15.0 2023-11-27 07:42:50,958 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3795893.3333333335, ans=0.125 2023-11-27 07:42:53,399 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.76 vs. limit=15.0 2023-11-27 07:42:54,034 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3795960.0, ans=0.2 2023-11-27 07:42:59,163 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 569400 2023-11-27 07:43:27,438 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3796093.3333333335, ans=0.2 2023-11-27 07:43:28,602 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3796160.0, ans=0.2 2023-11-27 07:43:29,340 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 4300, loss[loss=0.06389, simple_loss=0.09486, pruned_loss=0.009699, audio_tagging_loss=0.006758, over 15506.00 frames. ], tot_loss[loss=0.066, simple_loss=0.09056, pruned_loss=0.01228, audio_tagging_loss=0.008438, over 3050655.27 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:43:42,497 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.62 vs. limit=15.0 2023-11-27 07:43:43,367 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3796226.6666666665, ans=0.125 2023-11-27 07:43:44,913 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.61 vs. limit=22.5 2023-11-27 07:43:50,068 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.655e+01 9.134e+01 9.847e+01 1.057e+02 1.328e+02, threshold=1.969e+02, percent-clipped=0.0 2023-11-27 07:43:56,063 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 569450 2023-11-27 07:44:12,594 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.62 vs. limit=22.5 2023-11-27 07:44:14,409 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3796426.6666666665, ans=0.125 2023-11-27 07:44:14,417 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3796426.6666666665, ans=0.1 2023-11-27 07:44:23,040 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3796426.6666666665, ans=0.125 2023-11-27 07:44:24,878 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 4350, loss[loss=0.06937, simple_loss=0.08981, pruned_loss=0.01372, audio_tagging_loss=0.01075, over 14472.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.09082, pruned_loss=0.0124, audio_tagging_loss=0.008473, over 3046572.42 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:44:36,561 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.73 vs. limit=15.0 2023-11-27 07:44:43,880 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 07:44:52,876 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 569500 2023-11-27 07:44:53,000 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3796626.6666666665, ans=0.1 2023-11-27 07:44:58,315 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3796693.3333333335, ans=0.125 2023-11-27 07:45:05,707 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3796693.3333333335, ans=0.125 2023-11-27 07:45:15,933 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3796760.0, ans=0.0 2023-11-27 07:45:20,909 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 4400, loss[loss=0.06758, simple_loss=0.09663, pruned_loss=0.01178, audio_tagging_loss=0.007494, over 14936.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.09073, pruned_loss=0.01238, audio_tagging_loss=0.008532, over 3045137.62 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 07:45:22,656 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3796826.6666666665, ans=0.1 2023-11-27 07:45:42,614 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.177e+01 8.963e+01 9.534e+01 1.011e+02 1.280e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-27 07:45:47,990 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 569550 2023-11-27 07:45:58,121 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3797026.6666666665, ans=0.0 2023-11-27 07:46:17,080 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 4450, loss[loss=0.06206, simple_loss=0.08608, pruned_loss=0.009267, audio_tagging_loss=0.00976, over 15024.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.09088, pruned_loss=0.01226, audio_tagging_loss=0.00846, over 3049206.30 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:46:23,064 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3797160.0, ans=0.125 2023-11-27 07:46:29,008 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.91 vs. limit=15.0 2023-11-27 07:46:43,690 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 569600 2023-11-27 07:47:05,828 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3797426.6666666665, ans=0.1 2023-11-27 07:47:11,252 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.69 vs. limit=22.5 2023-11-27 07:47:12,975 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 4500, loss[loss=0.06114, simple_loss=0.08424, pruned_loss=0.0113, audio_tagging_loss=0.007721, over 14751.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.08988, pruned_loss=0.01213, audio_tagging_loss=0.008489, over 3046792.45 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:47:15,140 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3797493.3333333335, ans=0.125 2023-11-27 07:47:28,666 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.11 vs. limit=10.0 2023-11-27 07:47:31,414 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.32 vs. limit=22.5 2023-11-27 07:47:35,702 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.963e+01 9.073e+01 9.744e+01 1.047e+02 1.558e+02, threshold=1.949e+02, percent-clipped=0.0 2023-11-27 07:47:39,184 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3797626.6666666665, ans=0.0 2023-11-27 07:47:39,546 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.82 vs. limit=15.0 2023-11-27 07:47:40,055 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 569650 2023-11-27 07:47:59,705 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3797760.0, ans=0.125 2023-11-27 07:48:08,489 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 4550, loss[loss=0.06232, simple_loss=0.08845, pruned_loss=0.01049, audio_tagging_loss=0.007605, over 14136.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08944, pruned_loss=0.01201, audio_tagging_loss=0.008518, over 3044650.39 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:48:35,672 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 569700 2023-11-27 07:48:42,187 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3798026.6666666665, ans=0.125 2023-11-27 07:48:51,735 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.74 vs. limit=15.0 2023-11-27 07:48:52,298 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 07:48:55,133 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3798093.3333333335, ans=0.125 2023-11-27 07:49:05,197 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 4600, loss[loss=0.08842, simple_loss=0.1212, pruned_loss=0.0196, audio_tagging_loss=0.008219, over 16111.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08924, pruned_loss=0.01201, audio_tagging_loss=0.008628, over 3039694.44 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:49:11,949 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3798160.0, ans=0.125 2023-11-27 07:49:23,129 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3798226.6666666665, ans=0.2 2023-11-27 07:49:27,110 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.887e+01 8.803e+01 9.390e+01 1.011e+02 1.144e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-27 07:49:31,988 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 569750 2023-11-27 07:49:33,230 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3798293.3333333335, ans=0.0 2023-11-27 07:49:59,158 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.76 vs. limit=15.0 2023-11-27 07:50:00,884 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 4650, loss[loss=0.05501, simple_loss=0.06798, pruned_loss=0.01049, audio_tagging_loss=0.01053, over 15369.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08914, pruned_loss=0.012, audio_tagging_loss=0.008679, over 3042464.67 frames. ], batch size: 61, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:50:01,030 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3798493.3333333335, ans=0.0 2023-11-27 07:50:01,451 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.65 vs. limit=22.5 2023-11-27 07:50:13,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3798560.0, ans=0.125 2023-11-27 07:50:26,560 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3798626.6666666665, ans=0.0 2023-11-27 07:50:27,757 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3798626.6666666665, ans=0.125 2023-11-27 07:50:28,649 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 569800 2023-11-27 07:50:31,284 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3798626.6666666665, ans=0.0 2023-11-27 07:50:43,300 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.71 vs. limit=15.0 2023-11-27 07:50:45,091 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3798760.0, ans=0.2 2023-11-27 07:50:57,441 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 4700, loss[loss=0.06961, simple_loss=0.09617, pruned_loss=0.01312, audio_tagging_loss=0.008408, over 14908.00 frames. ], tot_loss[loss=0.06498, simple_loss=0.08869, pruned_loss=0.01186, audio_tagging_loss=0.008774, over 3040337.15 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:50:58,674 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3798826.6666666665, ans=0.125 2023-11-27 07:51:06,044 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.72 vs. limit=12.0 2023-11-27 07:51:12,600 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3798893.3333333335, ans=0.025 2023-11-27 07:51:19,709 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.561e+01 8.828e+01 9.714e+01 1.039e+02 1.424e+02, threshold=1.943e+02, percent-clipped=0.0 2023-11-27 07:51:23,963 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 569850 2023-11-27 07:51:37,938 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3799026.6666666665, ans=0.0 2023-11-27 07:51:43,204 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3799093.3333333335, ans=0.125 2023-11-27 07:51:53,640 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 4750, loss[loss=0.04805, simple_loss=0.06685, pruned_loss=0.004302, audio_tagging_loss=0.01032, over 14085.00 frames. ], tot_loss[loss=0.065, simple_loss=0.08854, pruned_loss=0.01187, audio_tagging_loss=0.008857, over 3042760.01 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:52:06,727 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3799226.6666666665, ans=0.0 2023-11-27 07:52:08,858 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3799226.6666666665, ans=0.0 2023-11-27 07:52:16,911 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3799293.3333333335, ans=0.95 2023-11-27 07:52:20,355 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 569900 2023-11-27 07:52:26,886 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3799360.0, ans=0.1 2023-11-27 07:52:48,851 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 4800, loss[loss=0.06763, simple_loss=0.08926, pruned_loss=0.014, audio_tagging_loss=0.009004, over 14733.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08874, pruned_loss=0.01191, audio_tagging_loss=0.008864, over 3043921.02 frames. ], batch size: 53, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 07:52:53,274 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3799493.3333333335, ans=0.1 2023-11-27 07:53:10,857 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3799626.6666666665, ans=0.125 2023-11-27 07:53:11,650 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.066e+01 9.043e+01 9.728e+01 1.036e+02 1.523e+02, threshold=1.946e+02, percent-clipped=0.0 2023-11-27 07:53:12,993 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3799626.6666666665, ans=0.125 2023-11-27 07:53:16,516 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 569950 2023-11-27 07:53:37,732 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3799760.0, ans=0.125 2023-11-27 07:53:45,020 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 4850, loss[loss=0.07517, simple_loss=0.1008, pruned_loss=0.01691, audio_tagging_loss=0.007845, over 15082.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08928, pruned_loss=0.01193, audio_tagging_loss=0.008863, over 3043668.81 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 07:54:11,675 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 570000 2023-11-27 07:54:41,097 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 4900, loss[loss=0.04977, simple_loss=0.0594, pruned_loss=0.008339, audio_tagging_loss=0.01173, over 15259.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08943, pruned_loss=0.01189, audio_tagging_loss=0.00868, over 3046937.71 frames. ], batch size: 60, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 07:54:43,756 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.66 vs. limit=6.0 2023-11-27 07:54:46,651 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 07:54:50,000 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.00 vs. limit=15.0 2023-11-27 07:54:51,940 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3800226.6666666665, ans=0.1 2023-11-27 07:54:54,313 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.19 vs. limit=15.0 2023-11-27 07:54:59,621 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.12 vs. limit=12.0 2023-11-27 07:55:02,212 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.718e+01 9.110e+01 9.715e+01 1.029e+02 1.331e+02, threshold=1.943e+02, percent-clipped=0.0 2023-11-27 07:55:06,615 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 570050 2023-11-27 07:55:20,131 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3800360.0, ans=0.125 2023-11-27 07:55:25,919 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3800426.6666666665, ans=0.2 2023-11-27 07:55:36,379 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 4950, loss[loss=0.06532, simple_loss=0.09263, pruned_loss=0.01237, audio_tagging_loss=0.006638, over 14391.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.08962, pruned_loss=0.01189, audio_tagging_loss=0.008628, over 3050703.45 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 07:55:54,816 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3800560.0, ans=0.0 2023-11-27 07:56:04,234 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 570100 2023-11-27 07:56:12,833 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3800693.3333333335, ans=0.025 2023-11-27 07:56:29,257 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.12 vs. limit=15.0 2023-11-27 07:56:30,257 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.57 vs. limit=15.0 2023-11-27 07:56:31,913 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 5000, loss[loss=0.07696, simple_loss=0.1077, pruned_loss=0.01329, audio_tagging_loss=0.009803, over 15726.00 frames. ], tot_loss[loss=0.06503, simple_loss=0.08914, pruned_loss=0.01181, audio_tagging_loss=0.008656, over 3052564.95 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 07:56:35,776 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3800826.6666666665, ans=0.125 2023-11-27 07:56:41,927 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.97 vs. limit=10.0 2023-11-27 07:56:55,301 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.720e+01 8.884e+01 9.444e+01 1.042e+02 1.203e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-27 07:56:55,524 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3800960.0, ans=0.125 2023-11-27 07:56:59,615 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 570150 2023-11-27 07:57:07,623 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.53 vs. limit=22.5 2023-11-27 07:57:23,298 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 07:57:26,912 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3801093.3333333335, ans=0.125 2023-11-27 07:57:29,492 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 5050, loss[loss=0.07167, simple_loss=0.1002, pruned_loss=0.01135, audio_tagging_loss=0.01023, over 15174.00 frames. ], tot_loss[loss=0.06449, simple_loss=0.08851, pruned_loss=0.01164, audio_tagging_loss=0.008605, over 3054342.91 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 07:57:30,028 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.20 vs. limit=15.0 2023-11-27 07:57:43,919 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.50 vs. limit=22.5 2023-11-27 07:57:46,769 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3801226.6666666665, ans=0.125 2023-11-27 07:57:53,465 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.25 vs. limit=22.5 2023-11-27 07:57:54,979 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 570200 2023-11-27 07:57:55,123 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff2.min_abs, batch_count=3801293.3333333335, ans=0.1 2023-11-27 07:57:56,480 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=3801293.3333333335, ans=0.05 2023-11-27 07:57:56,481 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3801293.3333333335, ans=0.1 2023-11-27 07:58:07,090 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3801360.0, ans=0.1 2023-11-27 07:58:08,589 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3801360.0, ans=0.0 2023-11-27 07:58:14,911 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=3801426.6666666665, ans=15.0 2023-11-27 07:58:23,019 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3801426.6666666665, ans=0.025 2023-11-27 07:58:24,960 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 5100, loss[loss=0.06961, simple_loss=0.09073, pruned_loss=0.01476, audio_tagging_loss=0.009492, over 15287.00 frames. ], tot_loss[loss=0.06435, simple_loss=0.08853, pruned_loss=0.01154, audio_tagging_loss=0.008548, over 3054024.34 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 07:58:35,704 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3801560.0, ans=0.07 2023-11-27 07:58:46,393 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.907e+01 8.861e+01 9.486e+01 1.041e+02 1.352e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-27 07:58:51,774 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 570250 2023-11-27 07:59:00,268 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.42 vs. limit=15.0 2023-11-27 07:59:01,033 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3801693.3333333335, ans=0.1 2023-11-27 07:59:06,299 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3801693.3333333335, ans=0.04949747468305833 2023-11-27 07:59:11,383 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3801760.0, ans=0.035 2023-11-27 07:59:18,988 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3801826.6666666665, ans=0.0 2023-11-27 07:59:19,741 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 5150, loss[loss=0.0766, simple_loss=0.1002, pruned_loss=0.01932, audio_tagging_loss=0.007157, over 15548.00 frames. ], tot_loss[loss=0.06461, simple_loss=0.08877, pruned_loss=0.01166, audio_tagging_loss=0.008573, over 3061285.44 frames. ], batch size: 60, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 07:59:21,033 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3801826.6666666665, ans=0.09899494936611666 2023-11-27 07:59:23,621 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3801826.6666666665, ans=0.2 2023-11-27 07:59:31,674 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3801893.3333333335, ans=0.2 2023-11-27 07:59:39,591 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3801893.3333333335, ans=0.0 2023-11-27 07:59:46,786 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 570300 2023-11-27 08:00:14,899 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.76 vs. limit=22.5 2023-11-27 08:00:15,512 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 5200, loss[loss=0.06555, simple_loss=0.09267, pruned_loss=0.0132, audio_tagging_loss=0.006016, over 15109.00 frames. ], tot_loss[loss=0.06511, simple_loss=0.08961, pruned_loss=0.01177, audio_tagging_loss=0.008528, over 3059962.13 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 08:00:34,795 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3802226.6666666665, ans=0.125 2023-11-27 08:00:39,804 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.659e+01 9.152e+01 9.640e+01 1.026e+02 1.239e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-27 08:00:42,057 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 570350 2023-11-27 08:00:51,115 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3802360.0, ans=0.125 2023-11-27 08:01:05,957 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.69 vs. limit=15.0 2023-11-27 08:01:11,805 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 5250, loss[loss=0.06224, simple_loss=0.08157, pruned_loss=0.01147, audio_tagging_loss=0.009982, over 16032.00 frames. ], tot_loss[loss=0.06506, simple_loss=0.08939, pruned_loss=0.01182, audio_tagging_loss=0.008544, over 3059233.60 frames. ], batch size: 60, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 08:01:12,103 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3802493.3333333335, ans=0.125 2023-11-27 08:01:16,446 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3802493.3333333335, ans=0.125 2023-11-27 08:01:29,230 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3802560.0, ans=0.125 2023-11-27 08:01:29,483 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.97 vs. limit=6.0 2023-11-27 08:01:38,160 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 570400 2023-11-27 08:02:05,961 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=3802760.0, ans=15.0 2023-11-27 08:02:07,494 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 5300, loss[loss=0.05976, simple_loss=0.08006, pruned_loss=0.01176, audio_tagging_loss=0.007972, over 16015.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.0896, pruned_loss=0.01182, audio_tagging_loss=0.008528, over 3059071.68 frames. ], batch size: 61, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 08:02:09,838 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3802826.6666666665, ans=0.05 2023-11-27 08:02:11,271 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.61 vs. limit=15.0 2023-11-27 08:02:17,996 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3802893.3333333335, ans=0.0 2023-11-27 08:02:33,175 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.488e+01 9.130e+01 9.779e+01 1.044e+02 2.518e+02, threshold=1.956e+02, percent-clipped=1.0 2023-11-27 08:02:35,413 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 570450 2023-11-27 08:02:41,354 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.16 vs. limit=15.0 2023-11-27 08:02:42,040 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.73 vs. limit=10.0 2023-11-27 08:02:46,083 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3803026.6666666665, ans=0.1 2023-11-27 08:02:46,164 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3803026.6666666665, ans=0.125 2023-11-27 08:03:03,254 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 5350, loss[loss=0.05375, simple_loss=0.06922, pruned_loss=0.0104, audio_tagging_loss=0.008749, over 14620.00 frames. ], tot_loss[loss=0.06506, simple_loss=0.0894, pruned_loss=0.0118, audio_tagging_loss=0.008557, over 3052360.80 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 08:03:20,158 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3803226.6666666665, ans=0.125 2023-11-27 08:03:29,697 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3803293.3333333335, ans=0.1 2023-11-27 08:03:29,938 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.47 vs. limit=15.0 2023-11-27 08:03:30,102 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.88 vs. limit=15.0 2023-11-27 08:03:30,612 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 570500 2023-11-27 08:03:43,904 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3803360.0, ans=0.0 2023-11-27 08:03:59,266 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3803493.3333333335, ans=0.125 2023-11-27 08:04:00,237 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 5400, loss[loss=0.06512, simple_loss=0.09334, pruned_loss=0.009092, audio_tagging_loss=0.009358, over 14714.00 frames. ], tot_loss[loss=0.06487, simple_loss=0.08921, pruned_loss=0.01171, audio_tagging_loss=0.008559, over 3048550.95 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 08:04:05,949 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.64 vs. limit=22.5 2023-11-27 08:04:11,123 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3803560.0, ans=0.0 2023-11-27 08:04:25,131 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.573e+01 8.928e+01 9.462e+01 1.035e+02 1.260e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-27 08:04:26,237 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 570550 2023-11-27 08:04:34,769 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3803693.3333333335, ans=0.125 2023-11-27 08:04:54,391 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3803826.6666666665, ans=0.0 2023-11-27 08:04:55,117 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 5450, loss[loss=0.06083, simple_loss=0.0881, pruned_loss=0.008288, audio_tagging_loss=0.008489, over 14669.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.08951, pruned_loss=0.01177, audio_tagging_loss=0.008629, over 3043552.95 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 08:05:04,830 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.39 vs. limit=15.0 2023-11-27 08:05:05,468 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3803893.3333333335, ans=0.125 2023-11-27 08:05:22,194 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 570600 2023-11-27 08:05:22,275 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3803960.0, ans=0.1 2023-11-27 08:05:39,443 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.69 vs. limit=15.0 2023-11-27 08:05:50,129 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3804160.0, ans=0.125 2023-11-27 08:05:51,033 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 5500, loss[loss=0.06518, simple_loss=0.0874, pruned_loss=0.01294, audio_tagging_loss=0.008538, over 14828.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.08963, pruned_loss=0.01178, audio_tagging_loss=0.008575, over 3053460.18 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 08:06:11,235 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.72 vs. limit=15.0 2023-11-27 08:06:16,934 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.810e+01 9.180e+01 9.726e+01 1.043e+02 1.311e+02, threshold=1.945e+02, percent-clipped=0.0 2023-11-27 08:06:18,117 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 570650 2023-11-27 08:06:22,904 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.89 vs. limit=15.0 2023-11-27 08:06:24,830 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 08:06:31,811 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.13 vs. limit=15.0 2023-11-27 08:06:47,586 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 5550, loss[loss=0.06475, simple_loss=0.09286, pruned_loss=0.009115, audio_tagging_loss=0.009205, over 15548.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08945, pruned_loss=0.01182, audio_tagging_loss=0.008636, over 3057781.35 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 08:07:14,279 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 570700 2023-11-27 08:07:15,534 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=3804626.6666666665, ans=10.0 2023-11-27 08:07:20,277 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3804693.3333333335, ans=0.1 2023-11-27 08:07:43,566 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 5600, loss[loss=0.06759, simple_loss=0.0955, pruned_loss=0.01141, audio_tagging_loss=0.008431, over 14759.00 frames. ], tot_loss[loss=0.06528, simple_loss=0.08943, pruned_loss=0.01181, audio_tagging_loss=0.008759, over 3054734.04 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 08:07:46,012 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3804826.6666666665, ans=0.125 2023-11-27 08:07:47,939 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3804826.6666666665, ans=0.125 2023-11-27 08:07:55,943 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3804893.3333333335, ans=10.0 2023-11-27 08:08:10,531 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.878e+01 8.987e+01 9.756e+01 1.044e+02 1.605e+02, threshold=1.951e+02, percent-clipped=0.0 2023-11-27 08:08:10,618 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 570750 2023-11-27 08:08:11,903 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3804960.0, ans=0.125 2023-11-27 08:08:20,888 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3805026.6666666665, ans=0.2 2023-11-27 08:08:23,713 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 08:08:33,658 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3805093.3333333335, ans=0.0 2023-11-27 08:08:36,348 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3805093.3333333335, ans=0.2 2023-11-27 08:08:39,232 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 5650, loss[loss=0.07112, simple_loss=0.09712, pruned_loss=0.01391, audio_tagging_loss=0.008647, over 16044.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08989, pruned_loss=0.01177, audio_tagging_loss=0.008806, over 3058892.20 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 08:08:40,619 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3805160.0, ans=0.125 2023-11-27 08:09:06,190 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 570800 2023-11-27 08:09:11,992 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3805360.0, ans=0.0 2023-11-27 08:09:35,560 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 5700, loss[loss=0.05479, simple_loss=0.07813, pruned_loss=0.006535, audio_tagging_loss=0.009196, over 14711.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08975, pruned_loss=0.0117, audio_tagging_loss=0.008764, over 3054343.43 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 08:09:41,205 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.57 vs. limit=15.0 2023-11-27 08:09:42,072 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3805493.3333333335, ans=0.125 2023-11-27 08:09:49,957 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3805560.0, ans=0.125 2023-11-27 08:10:01,767 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.519e+01 8.888e+01 9.534e+01 1.037e+02 1.369e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-27 08:10:01,872 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 570850 2023-11-27 08:10:05,192 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3805626.6666666665, ans=0.125 2023-11-27 08:10:13,740 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3805693.3333333335, ans=10.0 2023-11-27 08:10:26,535 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.46 vs. limit=15.0 2023-11-27 08:10:30,374 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.28 vs. limit=22.5 2023-11-27 08:10:30,868 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 5750, loss[loss=0.05568, simple_loss=0.07665, pruned_loss=0.01084, audio_tagging_loss=0.006518, over 15037.00 frames. ], tot_loss[loss=0.0645, simple_loss=0.08846, pruned_loss=0.01149, audio_tagging_loss=0.008779, over 3049160.19 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 08:10:43,705 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.60 vs. limit=22.5 2023-11-27 08:10:58,298 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 570900 2023-11-27 08:11:15,081 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3806093.3333333335, ans=0.07 2023-11-27 08:11:27,078 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 5800, loss[loss=0.06604, simple_loss=0.0854, pruned_loss=0.0155, audio_tagging_loss=0.007844, over 14451.00 frames. ], tot_loss[loss=0.06447, simple_loss=0.08842, pruned_loss=0.01158, audio_tagging_loss=0.008684, over 3043129.84 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 08:11:31,591 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3806160.0, ans=0.1 2023-11-27 08:11:36,371 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3806160.0, ans=0.2 2023-11-27 08:11:53,673 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.966e+01 9.206e+01 9.616e+01 1.021e+02 1.551e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-27 08:11:53,815 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 570950 2023-11-27 08:12:05,063 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3806360.0, ans=0.125 2023-11-27 08:12:08,704 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3806360.0, ans=0.1 2023-11-27 08:12:14,120 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3806426.6666666665, ans=0.125 2023-11-27 08:12:16,233 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3806426.6666666665, ans=0.125 2023-11-27 08:12:16,249 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3806426.6666666665, ans=0.05 2023-11-27 08:12:21,040 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3806426.6666666665, ans=0.125 2023-11-27 08:12:23,485 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 5850, loss[loss=0.07023, simple_loss=0.1097, pruned_loss=0.009463, audio_tagging_loss=0.005904, over 14562.00 frames. ], tot_loss[loss=0.06478, simple_loss=0.0891, pruned_loss=0.01162, audio_tagging_loss=0.008619, over 3038976.28 frames. ], batch size: 52, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 08:12:24,789 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3806493.3333333335, ans=0.0 2023-11-27 08:12:25,854 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3806493.3333333335, ans=0.0 2023-11-27 08:12:28,042 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3806493.3333333335, ans=0.2 2023-11-27 08:12:36,878 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.50 vs. limit=5.0 2023-11-27 08:12:49,868 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 571000 2023-11-27 08:12:53,397 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3806626.6666666665, ans=0.125 2023-11-27 08:13:02,286 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3806693.3333333335, ans=0.0 2023-11-27 08:13:05,532 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.55 vs. limit=15.0 2023-11-27 08:13:18,658 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 5900, loss[loss=0.07023, simple_loss=0.09529, pruned_loss=0.01305, audio_tagging_loss=0.009533, over 15688.00 frames. ], tot_loss[loss=0.06469, simple_loss=0.08905, pruned_loss=0.0116, audio_tagging_loss=0.008558, over 3046914.22 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 08:13:43,654 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3806960.0, ans=0.2 2023-11-27 08:13:43,757 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3806960.0, ans=0.0 2023-11-27 08:13:45,529 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.446e+01 9.203e+01 9.720e+01 1.067e+02 1.821e+02, threshold=1.944e+02, percent-clipped=0.0 2023-11-27 08:13:45,625 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 571050 2023-11-27 08:13:49,439 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3806960.0, ans=0.125 2023-11-27 08:14:14,863 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 5950, loss[loss=0.07562, simple_loss=0.105, pruned_loss=0.01588, audio_tagging_loss=0.007232, over 16351.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.08968, pruned_loss=0.0118, audio_tagging_loss=0.008446, over 3058757.51 frames. ], batch size: 61, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 08:14:15,078 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3807160.0, ans=0.125 2023-11-27 08:14:24,534 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3807226.6666666665, ans=0.2 2023-11-27 08:14:29,751 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3807226.6666666665, ans=0.125 2023-11-27 08:14:36,302 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3807293.3333333335, ans=0.125 2023-11-27 08:14:41,370 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 571100 2023-11-27 08:14:45,714 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3807293.3333333335, ans=0.125 2023-11-27 08:15:00,750 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.30 vs. limit=12.0 2023-11-27 08:15:02,465 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 08:15:02,476 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3807426.6666666665, ans=0.125 2023-11-27 08:15:06,879 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-27 08:15:10,286 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 6000, loss[loss=0.07299, simple_loss=0.103, pruned_loss=0.01441, audio_tagging_loss=0.007074, over 14161.00 frames. ], tot_loss[loss=0.06482, simple_loss=0.08932, pruned_loss=0.01171, audio_tagging_loss=0.008449, over 3060535.42 frames. ], batch size: 53, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 08:15:10,288 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-27 08:15:38,422 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([3.9563, 3.1573, 3.0156, 3.2109, 3.3910, 2.7520, 3.4268, 2.6496], device='cuda:0') 2023-11-27 08:15:42,630 INFO [train_asr.py:1267] (0/4) Epoch 48, validation: loss=0.05815, simple_loss=0.05046, pruned_loss=0.005371, audio_tagging_loss=0.02755, over 4681554.00 frames. 2023-11-27 08:15:42,631 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-27 08:15:57,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3807560.0, ans=0.95 2023-11-27 08:16:10,243 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.632e+01 8.889e+01 9.644e+01 1.039e+02 1.494e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-27 08:16:10,345 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 571150 2023-11-27 08:16:12,919 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.12 vs. limit=22.5 2023-11-27 08:16:23,522 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.03 vs. limit=15.0 2023-11-27 08:16:24,050 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 08:16:28,443 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3807760.0, ans=0.1 2023-11-27 08:16:30,029 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.42 vs. limit=12.0 2023-11-27 08:16:33,339 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 08:16:36,133 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3807760.0, ans=0.07 2023-11-27 08:16:39,038 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 6050, loss[loss=0.07774, simple_loss=0.1117, pruned_loss=0.01358, audio_tagging_loss=0.008323, over 15144.00 frames. ], tot_loss[loss=0.06493, simple_loss=0.08948, pruned_loss=0.0117, audio_tagging_loss=0.008491, over 3059417.35 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 08:16:54,230 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=3807893.3333333335, ans=10.0 2023-11-27 08:17:05,687 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 571200 2023-11-27 08:17:12,530 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3808026.6666666665, ans=0.125 2023-11-27 08:17:35,805 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 6100, loss[loss=0.05134, simple_loss=0.07638, pruned_loss=0.005186, audio_tagging_loss=0.007962, over 14386.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.09046, pruned_loss=0.01191, audio_tagging_loss=0.008467, over 3056026.35 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 08:17:35,990 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3808160.0, ans=0.0 2023-11-27 08:17:36,104 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3808160.0, ans=0.125 2023-11-27 08:17:55,074 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3808226.6666666665, ans=0.2 2023-11-27 08:17:56,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3808293.3333333335, ans=0.2 2023-11-27 08:18:00,971 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3808293.3333333335, ans=0.1 2023-11-27 08:18:01,817 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.766e+01 9.030e+01 9.632e+01 1.027e+02 1.334e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-27 08:18:01,917 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 571250 2023-11-27 08:18:02,255 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.53 vs. limit=15.0 2023-11-27 08:18:04,034 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3808293.3333333335, ans=0.125 2023-11-27 08:18:25,143 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 08:18:31,161 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 6150, loss[loss=0.06005, simple_loss=0.07517, pruned_loss=0.008192, audio_tagging_loss=0.01427, over 14128.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.0899, pruned_loss=0.01182, audio_tagging_loss=0.008444, over 3055818.74 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 08:18:38,153 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.54 vs. limit=6.0 2023-11-27 08:18:41,664 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.04 vs. limit=22.5 2023-11-27 08:18:58,668 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 571300 2023-11-27 08:18:59,828 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3808626.6666666665, ans=0.0 2023-11-27 08:19:14,062 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.55 vs. limit=15.0 2023-11-27 08:19:26,894 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 6200, loss[loss=0.05988, simple_loss=0.08293, pruned_loss=0.01078, audio_tagging_loss=0.007627, over 16367.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.09014, pruned_loss=0.012, audio_tagging_loss=0.008444, over 3053934.91 frames. ], batch size: 62, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:19:46,545 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=3808893.3333333335, ans=0.025 2023-11-27 08:19:46,565 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3808893.3333333335, ans=0.1 2023-11-27 08:19:53,790 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.167e+01 8.915e+01 9.429e+01 1.009e+02 1.347e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-27 08:19:53,901 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 571350 2023-11-27 08:20:02,502 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3809026.6666666665, ans=0.0 2023-11-27 08:20:17,802 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.17 vs. limit=22.5 2023-11-27 08:20:23,701 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 6250, loss[loss=0.06974, simple_loss=0.1058, pruned_loss=0.0095, audio_tagging_loss=0.007346, over 15923.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08949, pruned_loss=0.0118, audio_tagging_loss=0.008601, over 3059190.30 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:20:25,508 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.84 vs. limit=22.5 2023-11-27 08:20:31,274 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3809160.0, ans=0.125 2023-11-27 08:20:32,358 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3809160.0, ans=0.125 2023-11-27 08:20:44,035 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-27 08:20:49,670 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 571400 2023-11-27 08:21:11,914 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3809426.6666666665, ans=0.2 2023-11-27 08:21:17,274 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3809426.6666666665, ans=0.0 2023-11-27 08:21:19,227 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 6300, loss[loss=0.06316, simple_loss=0.09286, pruned_loss=0.01067, audio_tagging_loss=0.006055, over 15635.00 frames. ], tot_loss[loss=0.06504, simple_loss=0.08924, pruned_loss=0.01178, audio_tagging_loss=0.008634, over 3054157.56 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:21:24,678 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3809493.3333333335, ans=0.0 2023-11-27 08:21:33,941 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.21 vs. limit=22.5 2023-11-27 08:21:35,868 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 08:21:38,010 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3809560.0, ans=0.125 2023-11-27 08:21:46,278 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.650e+01 8.820e+01 9.366e+01 1.014e+02 1.355e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-27 08:21:46,386 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 571450 2023-11-27 08:22:02,163 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3809693.3333333335, ans=0.0 2023-11-27 08:22:15,185 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 6350, loss[loss=0.07253, simple_loss=0.1059, pruned_loss=0.01269, audio_tagging_loss=0.006909, over 14675.00 frames. ], tot_loss[loss=0.06439, simple_loss=0.08795, pruned_loss=0.0116, audio_tagging_loss=0.008818, over 3052239.72 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 8.0 2023-11-27 08:22:41,861 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 571500 2023-11-27 08:22:56,447 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3810026.6666666665, ans=0.1 2023-11-27 08:22:56,530 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3810026.6666666665, ans=0.0 2023-11-27 08:22:58,566 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3810093.3333333335, ans=0.125 2023-11-27 08:22:59,576 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3810093.3333333335, ans=0.0 2023-11-27 08:23:02,718 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3810093.3333333335, ans=0.125 2023-11-27 08:23:11,259 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 6400, loss[loss=0.07562, simple_loss=0.1007, pruned_loss=0.01744, audio_tagging_loss=0.00785, over 15195.00 frames. ], tot_loss[loss=0.06466, simple_loss=0.08843, pruned_loss=0.01159, audio_tagging_loss=0.008847, over 3054860.72 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:23:22,195 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.56 vs. limit=15.0 2023-11-27 08:23:37,600 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 571550 2023-11-27 08:23:39,071 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.509e+01 8.870e+01 9.357e+01 1.034e+02 1.188e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-27 08:23:41,351 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3810293.3333333335, ans=0.125 2023-11-27 08:23:41,830 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.84 vs. limit=22.5 2023-11-27 08:23:45,628 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3810360.0, ans=0.125 2023-11-27 08:23:59,102 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3810426.6666666665, ans=0.125 2023-11-27 08:23:59,505 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.38 vs. limit=22.5 2023-11-27 08:24:06,342 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3810493.3333333335, ans=0.2 2023-11-27 08:24:07,245 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 6450, loss[loss=0.07133, simple_loss=0.09791, pruned_loss=0.01516, audio_tagging_loss=0.007211, over 15484.00 frames. ], tot_loss[loss=0.06458, simple_loss=0.08815, pruned_loss=0.01153, audio_tagging_loss=0.008981, over 3051532.27 frames. ], batch size: 60, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:24:20,287 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3810560.0, ans=0.125 2023-11-27 08:24:27,266 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=3810560.0, ans=0.95 2023-11-27 08:24:33,664 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 571600 2023-11-27 08:24:42,167 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.36 vs. limit=10.0 2023-11-27 08:25:02,685 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 6500, loss[loss=0.07893, simple_loss=0.1154, pruned_loss=0.01373, audio_tagging_loss=0.007478, over 15768.00 frames. ], tot_loss[loss=0.06483, simple_loss=0.08855, pruned_loss=0.01163, audio_tagging_loss=0.008922, over 3058560.09 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:25:06,514 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3810826.6666666665, ans=0.125 2023-11-27 08:25:14,276 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3810893.3333333335, ans=0.1 2023-11-27 08:25:18,264 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.39 vs. limit=22.5 2023-11-27 08:25:21,858 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.71 vs. limit=15.0 2023-11-27 08:25:30,468 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 571650 2023-11-27 08:25:30,574 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3810960.0, ans=0.0 2023-11-27 08:25:31,468 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.977e+01 8.956e+01 9.682e+01 1.036e+02 1.299e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-27 08:25:41,198 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=3811026.6666666665, ans=0.025 2023-11-27 08:25:58,445 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 6550, loss[loss=0.05951, simple_loss=0.0828, pruned_loss=0.009417, audio_tagging_loss=0.00869, over 15813.00 frames. ], tot_loss[loss=0.06475, simple_loss=0.08883, pruned_loss=0.01151, audio_tagging_loss=0.00883, over 3059958.64 frames. ], batch size: 60, lr: 1.40e-03, grad_scale: 8.0 2023-11-27 08:26:16,337 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3811226.6666666665, ans=0.0 2023-11-27 08:26:20,531 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3811293.3333333335, ans=0.125 2023-11-27 08:26:25,776 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 571700 2023-11-27 08:26:31,251 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.32 vs. limit=22.5 2023-11-27 08:26:39,183 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3811360.0, ans=0.2 2023-11-27 08:26:41,265 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3811360.0, ans=0.125 2023-11-27 08:26:42,936 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3811426.6666666665, ans=0.2 2023-11-27 08:26:55,427 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 6600, loss[loss=0.06836, simple_loss=0.1038, pruned_loss=0.01004, audio_tagging_loss=0.006427, over 13721.00 frames. ], tot_loss[loss=0.06511, simple_loss=0.08949, pruned_loss=0.01172, audio_tagging_loss=0.008647, over 3056460.69 frames. ], batch size: 53, lr: 1.40e-03, grad_scale: 8.0 2023-11-27 08:27:03,192 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.12 vs. limit=15.0 2023-11-27 08:27:12,351 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.13 vs. limit=12.0 2023-11-27 08:27:15,102 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3811560.0, ans=0.1 2023-11-27 08:27:21,301 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 571750 2023-11-27 08:27:23,321 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.192e+01 9.078e+01 9.642e+01 1.016e+02 1.265e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-27 08:27:25,227 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3811626.6666666665, ans=0.0 2023-11-27 08:27:35,911 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3811693.3333333335, ans=0.1 2023-11-27 08:27:36,927 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3811693.3333333335, ans=0.125 2023-11-27 08:27:42,375 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.93 vs. limit=6.0 2023-11-27 08:27:44,231 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3811760.0, ans=0.2 2023-11-27 08:27:49,483 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3811826.6666666665, ans=0.1 2023-11-27 08:27:50,314 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 6650, loss[loss=0.07095, simple_loss=0.09349, pruned_loss=0.01643, audio_tagging_loss=0.007773, over 15362.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.08947, pruned_loss=0.01165, audio_tagging_loss=0.008585, over 3056177.53 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 8.0 2023-11-27 08:28:11,472 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3811893.3333333335, ans=0.125 2023-11-27 08:28:16,669 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3811960.0, ans=0.125 2023-11-27 08:28:18,110 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 571800 2023-11-27 08:28:19,469 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3811960.0, ans=0.125 2023-11-27 08:28:20,707 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3811960.0, ans=0.0 2023-11-27 08:28:24,867 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3812026.6666666665, ans=0.2 2023-11-27 08:28:27,022 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3812026.6666666665, ans=0.2 2023-11-27 08:28:31,236 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3812026.6666666665, ans=0.2 2023-11-27 08:28:45,451 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3812160.0, ans=0.125 2023-11-27 08:28:46,281 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 6700, loss[loss=0.06916, simple_loss=0.09255, pruned_loss=0.01404, audio_tagging_loss=0.00884, over 15637.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.09012, pruned_loss=0.01186, audio_tagging_loss=0.008531, over 3050969.05 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 8.0 2023-11-27 08:29:13,397 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 571850 2023-11-27 08:29:15,458 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.479e+01 9.098e+01 9.634e+01 1.039e+02 1.370e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-27 08:29:16,779 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3812293.3333333335, ans=0.1 2023-11-27 08:29:18,942 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3812360.0, ans=0.0 2023-11-27 08:29:24,603 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3812360.0, ans=0.125 2023-11-27 08:29:36,443 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3812426.6666666665, ans=0.125 2023-11-27 08:29:42,731 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 6750, loss[loss=0.06544, simple_loss=0.08506, pruned_loss=0.01378, audio_tagging_loss=0.009134, over 15314.00 frames. ], tot_loss[loss=0.06446, simple_loss=0.08855, pruned_loss=0.01159, audio_tagging_loss=0.008588, over 3044512.24 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 8.0 2023-11-27 08:29:45,033 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3812493.3333333335, ans=0.09899494936611666 2023-11-27 08:29:54,937 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3812560.0, ans=0.125 2023-11-27 08:29:55,037 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3812560.0, ans=0.0 2023-11-27 08:30:09,239 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 571900 2023-11-27 08:30:09,344 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3812626.6666666665, ans=0.05 2023-11-27 08:30:19,910 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3812693.3333333335, ans=0.125 2023-11-27 08:30:34,649 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.52 vs. limit=15.0 2023-11-27 08:30:38,305 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 6800, loss[loss=0.06685, simple_loss=0.08721, pruned_loss=0.01439, audio_tagging_loss=0.008857, over 17163.00 frames. ], tot_loss[loss=0.06467, simple_loss=0.08878, pruned_loss=0.01171, audio_tagging_loss=0.008568, over 3045640.01 frames. ], batch size: 66, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:30:54,904 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3812893.3333333335, ans=0.125 2023-11-27 08:31:02,284 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3812960.0, ans=0.125 2023-11-27 08:31:05,393 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 571950 2023-11-27 08:31:07,402 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.719e+01 9.218e+01 9.743e+01 1.054e+02 1.281e+02, threshold=1.949e+02, percent-clipped=0.0 2023-11-27 08:31:25,130 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 08:31:33,862 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 6850, loss[loss=0.05381, simple_loss=0.07505, pruned_loss=0.007112, audio_tagging_loss=0.00917, over 15428.00 frames. ], tot_loss[loss=0.06433, simple_loss=0.0883, pruned_loss=0.01171, audio_tagging_loss=0.008469, over 3047370.51 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:31:43,613 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3813160.0, ans=0.125 2023-11-27 08:32:01,240 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 572000 2023-11-27 08:32:02,533 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-572000.pt 2023-11-27 08:32:06,990 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3813293.3333333335, ans=0.07 2023-11-27 08:32:20,132 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3813426.6666666665, ans=0.125 2023-11-27 08:32:22,786 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.50 vs. limit=10.0 2023-11-27 08:32:28,476 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3813426.6666666665, ans=0.1 2023-11-27 08:32:30,660 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3813426.6666666665, ans=0.0 2023-11-27 08:32:32,111 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.91 vs. limit=15.0 2023-11-27 08:32:32,566 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 6900, loss[loss=0.0582, simple_loss=0.08207, pruned_loss=0.009038, audio_tagging_loss=0.008131, over 14543.00 frames. ], tot_loss[loss=0.06426, simple_loss=0.08829, pruned_loss=0.01168, audio_tagging_loss=0.008434, over 3041136.23 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:32:48,350 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.95 vs. limit=15.0 2023-11-27 08:32:50,202 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3813560.0, ans=0.125 2023-11-27 08:32:50,242 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3813560.0, ans=0.025 2023-11-27 08:32:52,268 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3813560.0, ans=0.2 2023-11-27 08:32:56,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3813626.6666666665, ans=0.0 2023-11-27 08:32:58,976 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 572050 2023-11-27 08:32:59,046 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3813626.6666666665, ans=0.125 2023-11-27 08:33:01,023 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.610e+01 8.797e+01 9.367e+01 1.009e+02 1.933e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-27 08:33:06,514 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3813693.3333333335, ans=0.0 2023-11-27 08:33:15,896 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 08:33:28,075 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 6950, loss[loss=0.06831, simple_loss=0.09096, pruned_loss=0.01447, audio_tagging_loss=0.008357, over 14076.00 frames. ], tot_loss[loss=0.06418, simple_loss=0.08826, pruned_loss=0.01156, audio_tagging_loss=0.008493, over 3045202.90 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:33:54,714 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.12 vs. limit=15.0 2023-11-27 08:33:55,164 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 572100 2023-11-27 08:33:57,436 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3813960.0, ans=0.125 2023-11-27 08:34:08,033 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3814026.6666666665, ans=0.125 2023-11-27 08:34:12,104 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.42 vs. limit=22.5 2023-11-27 08:34:23,571 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 7000, loss[loss=0.05155, simple_loss=0.06424, pruned_loss=0.007723, audio_tagging_loss=0.0117, over 15287.00 frames. ], tot_loss[loss=0.06435, simple_loss=0.08816, pruned_loss=0.01161, audio_tagging_loss=0.008651, over 3042682.80 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:34:24,914 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3814160.0, ans=0.1 2023-11-27 08:34:40,790 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3814226.6666666665, ans=0.125 2023-11-27 08:34:48,661 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.89 vs. limit=15.0 2023-11-27 08:34:48,828 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.21 vs. limit=15.0 2023-11-27 08:34:50,193 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 572150 2023-11-27 08:34:52,165 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.553e+01 9.213e+01 9.596e+01 1.029e+02 1.427e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-27 08:35:11,641 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3814426.6666666665, ans=0.125 2023-11-27 08:35:16,284 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3814426.6666666665, ans=0.125 2023-11-27 08:35:19,319 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 7050, loss[loss=0.07813, simple_loss=0.09821, pruned_loss=0.01976, audio_tagging_loss=0.009272, over 15910.00 frames. ], tot_loss[loss=0.06464, simple_loss=0.08851, pruned_loss=0.01168, audio_tagging_loss=0.008708, over 3039671.02 frames. ], batch size: 60, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:35:46,003 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 572200 2023-11-27 08:35:49,824 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.00 vs. limit=12.0 2023-11-27 08:36:11,733 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3814760.0, ans=0.125 2023-11-27 08:36:14,655 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 7100, loss[loss=0.08055, simple_loss=0.1008, pruned_loss=0.02163, audio_tagging_loss=0.008506, over 15164.00 frames. ], tot_loss[loss=0.06453, simple_loss=0.08818, pruned_loss=0.01168, audio_tagging_loss=0.008755, over 3043117.53 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:36:30,021 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3814893.3333333335, ans=0.125 2023-11-27 08:36:32,982 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3814893.3333333335, ans=0.2 2023-11-27 08:36:42,596 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 572250 2023-11-27 08:36:44,591 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.212e+01 9.022e+01 9.654e+01 1.030e+02 1.274e+02, threshold=1.931e+02, percent-clipped=0.0 2023-11-27 08:37:11,108 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 7150, loss[loss=0.06037, simple_loss=0.079, pruned_loss=0.008461, audio_tagging_loss=0.01241, over 14967.00 frames. ], tot_loss[loss=0.06466, simple_loss=0.08826, pruned_loss=0.01173, audio_tagging_loss=0.008796, over 3044545.81 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:37:17,730 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3815160.0, ans=0.1 2023-11-27 08:37:31,579 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3815226.6666666665, ans=0.0 2023-11-27 08:37:33,882 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 08:37:38,041 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 572300 2023-11-27 08:37:51,060 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3815360.0, ans=0.125 2023-11-27 08:37:56,770 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.35 vs. limit=15.0 2023-11-27 08:37:58,744 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.23 vs. limit=15.0 2023-11-27 08:38:01,650 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3815426.6666666665, ans=0.125 2023-11-27 08:38:05,357 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3815426.6666666665, ans=0.125 2023-11-27 08:38:07,767 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 7200, loss[loss=0.0592, simple_loss=0.08157, pruned_loss=0.01004, audio_tagging_loss=0.008374, over 15184.00 frames. ], tot_loss[loss=0.06455, simple_loss=0.08773, pruned_loss=0.01182, audio_tagging_loss=0.00887, over 3043678.55 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 08:38:28,580 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.33 vs. limit=15.0 2023-11-27 08:38:33,736 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 572350 2023-11-27 08:38:35,753 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.066e+01 9.027e+01 9.481e+01 1.011e+02 1.295e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-27 08:38:55,227 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 08:39:02,505 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 7250, loss[loss=0.06379, simple_loss=0.092, pruned_loss=0.008654, audio_tagging_loss=0.009132, over 15229.00 frames. ], tot_loss[loss=0.06434, simple_loss=0.08768, pruned_loss=0.01167, audio_tagging_loss=0.00883, over 3039001.84 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 08:39:03,856 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3815826.6666666665, ans=0.125 2023-11-27 08:39:18,617 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3815893.3333333335, ans=0.0 2023-11-27 08:39:29,499 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 572400 2023-11-27 08:39:31,019 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3815960.0, ans=0.1 2023-11-27 08:39:39,879 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3816026.6666666665, ans=0.1 2023-11-27 08:39:53,581 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3816093.3333333335, ans=0.125 2023-11-27 08:39:53,723 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3816093.3333333335, ans=0.0 2023-11-27 08:39:58,258 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 7300, loss[loss=0.05449, simple_loss=0.07517, pruned_loss=0.007645, audio_tagging_loss=0.009257, over 14749.00 frames. ], tot_loss[loss=0.06456, simple_loss=0.08813, pruned_loss=0.01171, audio_tagging_loss=0.008788, over 3038381.78 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 08:40:00,006 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3816160.0, ans=0.125 2023-11-27 08:40:07,461 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3816160.0, ans=0.125 2023-11-27 08:40:15,278 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.84 vs. limit=22.5 2023-11-27 08:40:18,091 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3816226.6666666665, ans=0.0 2023-11-27 08:40:25,388 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 572450 2023-11-27 08:40:25,775 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.79 vs. limit=15.0 2023-11-27 08:40:27,400 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.653e+01 9.262e+01 9.740e+01 1.057e+02 1.335e+02, threshold=1.948e+02, percent-clipped=0.0 2023-11-27 08:40:32,975 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3816360.0, ans=0.0 2023-11-27 08:40:42,000 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3816426.6666666665, ans=0.2 2023-11-27 08:40:50,846 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3816426.6666666665, ans=0.125 2023-11-27 08:40:54,389 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 7350, loss[loss=0.08022, simple_loss=0.1072, pruned_loss=0.02032, audio_tagging_loss=0.006285, over 15427.00 frames. ], tot_loss[loss=0.0644, simple_loss=0.08803, pruned_loss=0.01177, audio_tagging_loss=0.008608, over 3039179.99 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 08:41:09,978 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3816560.0, ans=0.1 2023-11-27 08:41:12,108 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3816560.0, ans=0.0 2023-11-27 08:41:20,317 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 572500 2023-11-27 08:41:49,714 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 7400, loss[loss=0.05474, simple_loss=0.06629, pruned_loss=0.01136, audio_tagging_loss=0.01023, over 15523.00 frames. ], tot_loss[loss=0.06448, simple_loss=0.08847, pruned_loss=0.01176, audio_tagging_loss=0.00848, over 3040594.84 frames. ], batch size: 61, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:42:10,012 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3816893.3333333335, ans=0.125 2023-11-27 08:42:16,183 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 572550 2023-11-27 08:42:19,727 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.911e+01 9.063e+01 9.701e+01 1.022e+02 1.505e+02, threshold=1.940e+02, percent-clipped=0.0 2023-11-27 08:42:33,250 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3817093.3333333335, ans=0.125 2023-11-27 08:42:40,703 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3817093.3333333335, ans=0.1 2023-11-27 08:42:41,683 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3817093.3333333335, ans=0.07 2023-11-27 08:42:44,747 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 7450, loss[loss=0.0479, simple_loss=0.06407, pruned_loss=0.00647, audio_tagging_loss=0.009398, over 14385.00 frames. ], tot_loss[loss=0.06412, simple_loss=0.08779, pruned_loss=0.01177, audio_tagging_loss=0.008449, over 3039137.16 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:42:51,854 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3817160.0, ans=0.04949747468305833 2023-11-27 08:42:56,630 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3817226.6666666665, ans=0.2 2023-11-27 08:43:09,266 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3817293.3333333335, ans=0.0 2023-11-27 08:43:11,442 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3817293.3333333335, ans=0.1 2023-11-27 08:43:12,339 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 572600 2023-11-27 08:43:13,927 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3817293.3333333335, ans=0.025 2023-11-27 08:43:20,069 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3817360.0, ans=0.125 2023-11-27 08:43:21,174 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3817360.0, ans=0.125 2023-11-27 08:43:22,231 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3817360.0, ans=0.125 2023-11-27 08:43:33,563 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3817426.6666666665, ans=0.0 2023-11-27 08:43:41,242 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 7500, loss[loss=0.08555, simple_loss=0.1195, pruned_loss=0.01727, audio_tagging_loss=0.008514, over 15211.00 frames. ], tot_loss[loss=0.06429, simple_loss=0.08819, pruned_loss=0.01166, audio_tagging_loss=0.00853, over 3043867.49 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:43:41,939 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.53 vs. limit=12.0 2023-11-27 08:43:46,930 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.50 vs. limit=15.0 2023-11-27 08:43:48,280 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.14 vs. limit=15.0 2023-11-27 08:43:48,879 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3817493.3333333335, ans=0.125 2023-11-27 08:44:02,165 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.43 vs. limit=22.5 2023-11-27 08:44:07,885 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 572650 2023-11-27 08:44:11,555 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.822e+01 8.822e+01 9.501e+01 1.047e+02 1.367e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-27 08:44:14,266 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.09 vs. limit=15.0 2023-11-27 08:44:18,078 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3817693.3333333335, ans=0.125 2023-11-27 08:44:18,164 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3817693.3333333335, ans=0.1 2023-11-27 08:44:33,689 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.38 vs. limit=15.0 2023-11-27 08:44:35,992 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.11 vs. limit=15.0 2023-11-27 08:44:37,567 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 7550, loss[loss=0.05706, simple_loss=0.08521, pruned_loss=0.008243, audio_tagging_loss=0.006214, over 16012.00 frames. ], tot_loss[loss=0.06387, simple_loss=0.0877, pruned_loss=0.01153, audio_tagging_loss=0.008489, over 3047804.99 frames. ], batch size: 61, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:44:51,336 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3817893.3333333335, ans=0.125 2023-11-27 08:45:03,369 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 572700 2023-11-27 08:45:06,303 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3817960.0, ans=0.2 2023-11-27 08:45:23,365 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3818093.3333333335, ans=0.07 2023-11-27 08:45:26,504 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3818093.3333333335, ans=0.2 2023-11-27 08:45:32,592 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 7600, loss[loss=0.065, simple_loss=0.08385, pruned_loss=0.01407, audio_tagging_loss=0.009, over 14054.00 frames. ], tot_loss[loss=0.06356, simple_loss=0.08712, pruned_loss=0.01152, audio_tagging_loss=0.00849, over 3048082.38 frames. ], batch size: 53, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 08:45:59,811 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 572750 2023-11-27 08:46:02,835 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.589e+01 8.740e+01 9.501e+01 1.030e+02 1.304e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-27 08:46:05,224 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3818360.0, ans=0.125 2023-11-27 08:46:06,237 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3818360.0, ans=0.0 2023-11-27 08:46:12,860 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3818360.0, ans=0.05 2023-11-27 08:46:16,131 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3818426.6666666665, ans=0.2 2023-11-27 08:46:17,682 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3818426.6666666665, ans=0.05 2023-11-27 08:46:25,102 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3818426.6666666665, ans=0.125 2023-11-27 08:46:28,120 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 7650, loss[loss=0.04901, simple_loss=0.06825, pruned_loss=0.003669, audio_tagging_loss=0.01122, over 14810.00 frames. ], tot_loss[loss=0.06419, simple_loss=0.088, pruned_loss=0.01171, audio_tagging_loss=0.008474, over 3039556.71 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 08:46:48,329 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.72 vs. limit=15.0 2023-11-27 08:46:55,243 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 572800 2023-11-27 08:47:02,460 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3818693.3333333335, ans=0.125 2023-11-27 08:47:24,437 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 7700, loss[loss=0.05181, simple_loss=0.07208, pruned_loss=0.007522, audio_tagging_loss=0.008243, over 15385.00 frames. ], tot_loss[loss=0.06502, simple_loss=0.08919, pruned_loss=0.01196, audio_tagging_loss=0.008462, over 3036082.63 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 08:47:30,937 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3818826.6666666665, ans=0.125 2023-11-27 08:47:39,873 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.37 vs. limit=15.0 2023-11-27 08:47:43,461 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.47 vs. limit=15.0 2023-11-27 08:47:44,648 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.75 vs. limit=12.0 2023-11-27 08:47:48,584 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3818960.0, ans=0.125 2023-11-27 08:47:50,541 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 572850 2023-11-27 08:47:53,670 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.196e+01 9.058e+01 9.794e+01 1.057e+02 1.473e+02, threshold=1.959e+02, percent-clipped=0.0 2023-11-27 08:48:12,954 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.27 vs. limit=15.0 2023-11-27 08:48:19,661 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 7750, loss[loss=0.06343, simple_loss=0.09109, pruned_loss=0.01074, audio_tagging_loss=0.007153, over 15465.00 frames. ], tot_loss[loss=0.0651, simple_loss=0.08944, pruned_loss=0.01192, audio_tagging_loss=0.008461, over 3040379.53 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 08:48:22,046 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3819160.0, ans=0.125 2023-11-27 08:48:34,786 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.15 vs. limit=15.0 2023-11-27 08:48:40,046 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3819226.6666666665, ans=0.05 2023-11-27 08:48:47,339 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 572900 2023-11-27 08:48:51,656 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3819293.3333333335, ans=0.2 2023-11-27 08:48:59,249 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.07 vs. limit=15.0 2023-11-27 08:49:15,356 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 7800, loss[loss=0.07402, simple_loss=0.114, pruned_loss=0.009954, audio_tagging_loss=0.007073, over 15358.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.0901, pruned_loss=0.0121, audio_tagging_loss=0.008408, over 3034785.63 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 08:49:42,517 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 572950 2023-11-27 08:49:45,620 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.310e+01 9.181e+01 9.727e+01 1.046e+02 1.272e+02, threshold=1.945e+02, percent-clipped=0.0 2023-11-27 08:49:58,067 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3819693.3333333335, ans=0.125 2023-11-27 08:50:11,298 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.79 vs. limit=10.0 2023-11-27 08:50:11,739 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 7850, loss[loss=0.06591, simple_loss=0.09123, pruned_loss=0.01179, audio_tagging_loss=0.008502, over 15669.00 frames. ], tot_loss[loss=0.06522, simple_loss=0.08938, pruned_loss=0.01191, audio_tagging_loss=0.008615, over 3038829.18 frames. ], batch size: 61, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 08:50:12,522 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.85 vs. limit=15.0 2023-11-27 08:50:28,160 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3819893.3333333335, ans=0.1 2023-11-27 08:50:38,033 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 573000 2023-11-27 08:50:38,759 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.63 vs. limit=22.5 2023-11-27 08:50:41,934 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.68 vs. limit=22.5 2023-11-27 08:50:53,188 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3820026.6666666665, ans=0.0 2023-11-27 08:51:07,217 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 7900, loss[loss=0.04813, simple_loss=0.06494, pruned_loss=0.005585, audio_tagging_loss=0.01007, over 16532.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.0896, pruned_loss=0.01199, audio_tagging_loss=0.008613, over 3040222.04 frames. ], batch size: 65, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 08:51:10,945 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.40 vs. limit=15.0 2023-11-27 08:51:20,525 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 08:51:34,170 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 573050 2023-11-27 08:51:37,299 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.827e+01 9.140e+01 9.856e+01 1.052e+02 1.450e+02, threshold=1.971e+02, percent-clipped=0.0 2023-11-27 08:51:42,913 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.98 vs. limit=15.0 2023-11-27 08:51:43,350 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3820360.0, ans=0.125 2023-11-27 08:51:49,697 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3820360.0, ans=0.125 2023-11-27 08:51:54,217 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.74 vs. limit=22.5 2023-11-27 08:51:58,736 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3820426.6666666665, ans=0.0 2023-11-27 08:51:58,795 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3820426.6666666665, ans=0.5 2023-11-27 08:52:02,778 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 7950, loss[loss=0.06132, simple_loss=0.09276, pruned_loss=0.008622, audio_tagging_loss=0.006321, over 15432.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.089, pruned_loss=0.01187, audio_tagging_loss=0.008753, over 3040107.42 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 08:52:06,505 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.50 vs. limit=6.0 2023-11-27 08:52:17,619 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 08:52:21,963 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3820560.0, ans=0.125 2023-11-27 08:52:24,717 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3820626.6666666665, ans=0.125 2023-11-27 08:52:27,816 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3820626.6666666665, ans=0.125 2023-11-27 08:52:29,797 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 573100 2023-11-27 08:52:29,981 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3820626.6666666665, ans=0.125 2023-11-27 08:52:33,380 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.15 vs. limit=6.0 2023-11-27 08:52:51,806 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3820760.0, ans=0.0 2023-11-27 08:52:54,568 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.91 vs. limit=15.0 2023-11-27 08:52:59,097 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 8000, loss[loss=0.06372, simple_loss=0.09271, pruned_loss=0.01024, audio_tagging_loss=0.007126, over 15392.00 frames. ], tot_loss[loss=0.06453, simple_loss=0.08777, pruned_loss=0.01163, audio_tagging_loss=0.009016, over 3034805.87 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 08:53:23,561 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3820960.0, ans=0.035 2023-11-27 08:53:25,493 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 573150 2023-11-27 08:53:28,612 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.317e+01 8.923e+01 9.617e+01 1.018e+02 1.242e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-27 08:53:41,435 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3821026.6666666665, ans=0.1 2023-11-27 08:53:42,030 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.29 vs. limit=6.0 2023-11-27 08:53:43,543 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3821093.3333333335, ans=0.125 2023-11-27 08:53:46,799 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 08:53:54,533 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 8050, loss[loss=0.05862, simple_loss=0.08797, pruned_loss=0.00669, audio_tagging_loss=0.007945, over 14236.00 frames. ], tot_loss[loss=0.06502, simple_loss=0.08858, pruned_loss=0.01176, audio_tagging_loss=0.008978, over 3039498.40 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 08:54:21,102 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 573200 2023-11-27 08:54:46,761 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.05 vs. limit=15.0 2023-11-27 08:54:50,365 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 8100, loss[loss=0.07286, simple_loss=0.1081, pruned_loss=0.01056, audio_tagging_loss=0.008249, over 15085.00 frames. ], tot_loss[loss=0.06522, simple_loss=0.08893, pruned_loss=0.01188, audio_tagging_loss=0.008868, over 3035394.19 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:55:00,720 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3821560.0, ans=0.1 2023-11-27 08:55:11,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3821626.6666666665, ans=0.125 2023-11-27 08:55:16,147 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 08:55:17,021 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 573250 2023-11-27 08:55:21,705 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.859e+01 8.982e+01 9.731e+01 1.040e+02 1.240e+02, threshold=1.946e+02, percent-clipped=0.0 2023-11-27 08:55:26,268 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3821693.3333333335, ans=0.0 2023-11-27 08:55:46,154 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 8150, loss[loss=0.06694, simple_loss=0.09522, pruned_loss=0.009397, audio_tagging_loss=0.00993, over 15282.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08983, pruned_loss=0.01195, audio_tagging_loss=0.00875, over 3038544.95 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:56:12,420 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3821960.0, ans=0.0 2023-11-27 08:56:13,221 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 573300 2023-11-27 08:56:15,627 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3821960.0, ans=0.1 2023-11-27 08:56:31,952 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.94 vs. limit=10.0 2023-11-27 08:56:41,845 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 8200, loss[loss=0.07824, simple_loss=0.1043, pruned_loss=0.01852, audio_tagging_loss=0.007552, over 15420.00 frames. ], tot_loss[loss=0.06488, simple_loss=0.08898, pruned_loss=0.0117, audio_tagging_loss=0.008688, over 3041867.49 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:56:42,908 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 08:56:55,878 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3822226.6666666665, ans=0.0 2023-11-27 08:57:00,975 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3822226.6666666665, ans=0.125 2023-11-27 08:57:03,281 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3822293.3333333335, ans=0.2 2023-11-27 08:57:08,860 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 573350 2023-11-27 08:57:13,633 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.698e+01 9.014e+01 9.648e+01 1.048e+02 1.501e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-27 08:57:23,299 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3822360.0, ans=0.1 2023-11-27 08:57:30,712 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3822426.6666666665, ans=0.125 2023-11-27 08:57:34,579 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3822426.6666666665, ans=0.0 2023-11-27 08:57:38,058 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 8250, loss[loss=0.07103, simple_loss=0.09513, pruned_loss=0.0129, audio_tagging_loss=0.01056, over 14831.00 frames. ], tot_loss[loss=0.06441, simple_loss=0.08845, pruned_loss=0.0116, audio_tagging_loss=0.008583, over 3043384.62 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:57:43,673 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3822493.3333333335, ans=0.0 2023-11-27 08:57:48,952 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3822560.0, ans=0.1 2023-11-27 08:58:00,838 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3822626.6666666665, ans=0.95 2023-11-27 08:58:04,985 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 573400 2023-11-27 08:58:34,674 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 8300, loss[loss=0.08519, simple_loss=0.1223, pruned_loss=0.01704, audio_tagging_loss=0.007002, over 15974.00 frames. ], tot_loss[loss=0.06453, simple_loss=0.08837, pruned_loss=0.01176, audio_tagging_loss=0.008585, over 3038771.28 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:58:45,922 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3822893.3333333335, ans=0.125 2023-11-27 08:58:57,740 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3822960.0, ans=0.125 2023-11-27 08:59:01,358 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 573450 2023-11-27 08:59:05,535 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.009e+01 8.971e+01 9.543e+01 1.035e+02 1.385e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-27 08:59:11,168 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3823026.6666666665, ans=0.0 2023-11-27 08:59:12,305 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3823026.6666666665, ans=0.125 2023-11-27 08:59:13,395 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3823026.6666666665, ans=0.125 2023-11-27 08:59:18,164 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3823093.3333333335, ans=0.0 2023-11-27 08:59:30,189 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 8350, loss[loss=0.07761, simple_loss=0.1043, pruned_loss=0.01704, audio_tagging_loss=0.008409, over 15147.00 frames. ], tot_loss[loss=0.06417, simple_loss=0.088, pruned_loss=0.01162, audio_tagging_loss=0.008551, over 3046427.80 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:59:41,158 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3823226.6666666665, ans=0.125 2023-11-27 08:59:48,559 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3823226.6666666665, ans=0.0 2023-11-27 08:59:56,887 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 573500 2023-11-27 09:00:19,129 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3823426.6666666665, ans=0.125 2023-11-27 09:00:25,950 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 8400, loss[loss=0.07559, simple_loss=0.1001, pruned_loss=0.01748, audio_tagging_loss=0.00807, over 15230.00 frames. ], tot_loss[loss=0.06447, simple_loss=0.08841, pruned_loss=0.0117, audio_tagging_loss=0.008565, over 3035229.65 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:00:26,194 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3823493.3333333335, ans=0.0 2023-11-27 09:00:48,710 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3823626.6666666665, ans=0.125 2023-11-27 09:00:52,709 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 573550 2023-11-27 09:00:56,848 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.632e+01 8.941e+01 9.645e+01 1.032e+02 1.251e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-27 09:01:14,902 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3823760.0, ans=0.125 2023-11-27 09:01:14,956 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3823760.0, ans=0.2 2023-11-27 09:01:21,035 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 8450, loss[loss=0.06839, simple_loss=0.1023, pruned_loss=0.0123, audio_tagging_loss=0.004963, over 14598.00 frames. ], tot_loss[loss=0.06467, simple_loss=0.0888, pruned_loss=0.01176, audio_tagging_loss=0.008506, over 3048021.69 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:01:35,801 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3823893.3333333335, ans=0.1 2023-11-27 09:01:40,575 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.73 vs. limit=15.0 2023-11-27 09:01:47,443 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 573600 2023-11-27 09:01:49,268 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.04 vs. limit=15.0 2023-11-27 09:01:52,595 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.75 vs. limit=22.5 2023-11-27 09:02:11,858 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3824093.3333333335, ans=0.0 2023-11-27 09:02:15,068 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3824093.3333333335, ans=0.2 2023-11-27 09:02:16,972 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 8500, loss[loss=0.07105, simple_loss=0.102, pruned_loss=0.01341, audio_tagging_loss=0.006635, over 16613.00 frames. ], tot_loss[loss=0.06473, simple_loss=0.08887, pruned_loss=0.01179, audio_tagging_loss=0.008501, over 3048304.44 frames. ], batch size: 60, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:02:19,319 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3824160.0, ans=0.0 2023-11-27 09:02:24,594 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3824160.0, ans=0.125 2023-11-27 09:02:30,968 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3824226.6666666665, ans=0.125 2023-11-27 09:02:38,830 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3824293.3333333335, ans=0.0 2023-11-27 09:02:39,704 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.89 vs. limit=6.0 2023-11-27 09:02:43,618 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 573650 2023-11-27 09:02:48,207 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.021e+01 9.075e+01 9.563e+01 1.041e+02 1.324e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-27 09:02:49,599 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3824360.0, ans=0.125 2023-11-27 09:02:59,627 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3824360.0, ans=0.04949747468305833 2023-11-27 09:03:12,098 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 8550, loss[loss=0.07333, simple_loss=0.1072, pruned_loss=0.01335, audio_tagging_loss=0.006382, over 15384.00 frames. ], tot_loss[loss=0.06451, simple_loss=0.08832, pruned_loss=0.01172, audio_tagging_loss=0.008633, over 3051946.00 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:03:16,585 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.07 vs. limit=15.0 2023-11-27 09:03:39,732 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 573700 2023-11-27 09:04:08,393 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 8600, loss[loss=0.06581, simple_loss=0.08878, pruned_loss=0.01101, audio_tagging_loss=0.01041, over 15246.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.08887, pruned_loss=0.01187, audio_tagging_loss=0.008602, over 3043818.89 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:04:10,757 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3824826.6666666665, ans=0.125 2023-11-27 09:04:20,115 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3824893.3333333335, ans=0.125 2023-11-27 09:04:31,909 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3824960.0, ans=0.125 2023-11-27 09:04:34,976 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 573750 2023-11-27 09:04:39,602 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.804e+01 9.143e+01 9.907e+01 1.055e+02 1.409e+02, threshold=1.981e+02, percent-clipped=0.0 2023-11-27 09:04:48,292 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3825026.6666666665, ans=0.1 2023-11-27 09:05:04,577 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 8650, loss[loss=0.07353, simple_loss=0.1116, pruned_loss=0.01187, audio_tagging_loss=0.00589, over 15055.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08925, pruned_loss=0.0121, audio_tagging_loss=0.008647, over 3051500.95 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:05:12,469 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.16 vs. limit=15.0 2023-11-27 09:05:15,288 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3825226.6666666665, ans=0.1 2023-11-27 09:05:17,392 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3825226.6666666665, ans=0.0 2023-11-27 09:05:30,636 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 573800 2023-11-27 09:06:00,048 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 8700, loss[loss=0.06234, simple_loss=0.08048, pruned_loss=0.009814, audio_tagging_loss=0.01228, over 14640.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08945, pruned_loss=0.01216, audio_tagging_loss=0.008654, over 3047504.86 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:06:03,344 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3825493.3333333335, ans=0.0 2023-11-27 09:06:09,175 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3825493.3333333335, ans=0.0 2023-11-27 09:06:15,476 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3825560.0, ans=0.125 2023-11-27 09:06:16,625 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3825560.0, ans=0.0 2023-11-27 09:06:27,593 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 573850 2023-11-27 09:06:32,838 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.045e+01 9.211e+01 9.788e+01 1.039e+02 1.317e+02, threshold=1.958e+02, percent-clipped=0.0 2023-11-27 09:06:35,210 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-27 09:06:35,273 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3825693.3333333335, ans=0.2 2023-11-27 09:06:55,780 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 8750, loss[loss=0.07092, simple_loss=0.0976, pruned_loss=0.01367, audio_tagging_loss=0.008455, over 15431.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.08982, pruned_loss=0.01222, audio_tagging_loss=0.008742, over 3046780.54 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:07:13,457 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3825893.3333333335, ans=0.0 2023-11-27 09:07:17,786 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3825960.0, ans=0.2 2023-11-27 09:07:22,903 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 573900 2023-11-27 09:07:41,556 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3826093.3333333335, ans=0.125 2023-11-27 09:07:51,572 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3826160.0, ans=0.125 2023-11-27 09:07:52,414 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 8800, loss[loss=0.05796, simple_loss=0.07704, pruned_loss=0.01044, audio_tagging_loss=0.008994, over 15699.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.09098, pruned_loss=0.01236, audio_tagging_loss=0.008734, over 3052303.38 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:07:54,702 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3826160.0, ans=0.125 2023-11-27 09:07:55,854 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3826160.0, ans=0.1 2023-11-27 09:07:58,993 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3826160.0, ans=0.125 2023-11-27 09:08:17,996 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.87 vs. limit=22.5 2023-11-27 09:08:18,408 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 573950 2023-11-27 09:08:23,600 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.888e+01 9.347e+01 1.002e+02 1.077e+02 1.340e+02, threshold=2.003e+02, percent-clipped=0.0 2023-11-27 09:08:24,910 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 09:08:29,593 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3826360.0, ans=0.1 2023-11-27 09:08:47,541 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 8850, loss[loss=0.04345, simple_loss=0.05718, pruned_loss=0.005606, audio_tagging_loss=0.009253, over 14893.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.09066, pruned_loss=0.01227, audio_tagging_loss=0.008782, over 3058866.14 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:08:49,822 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3826493.3333333335, ans=10.0 2023-11-27 09:08:58,677 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 09:09:02,417 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.17 vs. limit=22.5 2023-11-27 09:09:13,959 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 574000 2023-11-27 09:09:20,132 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3826693.3333333335, ans=0.125 2023-11-27 09:09:24,448 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3826693.3333333335, ans=0.125 2023-11-27 09:09:36,550 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3826760.0, ans=0.125 2023-11-27 09:09:42,720 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 8900, loss[loss=0.09037, simple_loss=0.1255, pruned_loss=0.02135, audio_tagging_loss=0.006287, over 14403.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.09023, pruned_loss=0.01237, audio_tagging_loss=0.008752, over 3063176.03 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:10:00,532 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3826893.3333333335, ans=0.125 2023-11-27 09:10:09,991 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 574050 2023-11-27 09:10:16,221 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.150e+01 9.024e+01 9.616e+01 1.025e+02 1.217e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-27 09:10:16,551 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3827026.6666666665, ans=0.0 2023-11-27 09:10:31,786 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3827093.3333333335, ans=0.125 2023-11-27 09:10:38,536 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 8950, loss[loss=0.06013, simple_loss=0.08304, pruned_loss=0.0109, audio_tagging_loss=0.007713, over 14885.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.0897, pruned_loss=0.01225, audio_tagging_loss=0.008597, over 3058244.43 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:10:55,832 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3827226.6666666665, ans=0.125 2023-11-27 09:11:05,624 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 574100 2023-11-27 09:11:12,188 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3827360.0, ans=0.125 2023-11-27 09:11:14,615 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.75 vs. limit=12.0 2023-11-27 09:11:21,969 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.41 vs. limit=15.0 2023-11-27 09:11:33,892 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3827493.3333333335, ans=0.125 2023-11-27 09:11:34,691 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 9000, loss[loss=0.0662, simple_loss=0.08922, pruned_loss=0.0105, audio_tagging_loss=0.0111, over 14682.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.09008, pruned_loss=0.01216, audio_tagging_loss=0.008553, over 3061620.81 frames. ], batch size: 53, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:11:34,693 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-27 09:11:59,912 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([6.0361, 5.8808, 5.6658, 5.5840], device='cuda:0') 2023-11-27 09:12:07,534 INFO [train_asr.py:1267] (0/4) Epoch 48, validation: loss=0.05893, simple_loss=0.05035, pruned_loss=0.005253, audio_tagging_loss=0.0285, over 4681554.00 frames. 2023-11-27 09:12:07,535 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-27 09:12:34,560 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 574150 2023-11-27 09:12:40,848 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.883e+01 9.087e+01 9.718e+01 1.070e+02 1.602e+02, threshold=1.944e+02, percent-clipped=0.0 2023-11-27 09:12:43,732 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3827693.3333333335, ans=0.05 2023-11-27 09:13:00,128 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3827760.0, ans=0.125 2023-11-27 09:13:03,709 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 9050, loss[loss=0.08686, simple_loss=0.1191, pruned_loss=0.0199, audio_tagging_loss=0.007399, over 15469.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.09033, pruned_loss=0.01221, audio_tagging_loss=0.008485, over 3058341.48 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:13:26,183 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3827960.0, ans=0.125 2023-11-27 09:13:30,168 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 574200 2023-11-27 09:13:33,675 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3827960.0, ans=0.0 2023-11-27 09:13:51,210 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3828093.3333333335, ans=0.125 2023-11-27 09:13:55,519 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3828093.3333333335, ans=0.125 2023-11-27 09:13:59,483 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 9100, loss[loss=0.06593, simple_loss=0.09421, pruned_loss=0.01373, audio_tagging_loss=0.005093, over 14993.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.09067, pruned_loss=0.01231, audio_tagging_loss=0.008394, over 3062407.16 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:14:02,889 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3828160.0, ans=0.125 2023-11-27 09:14:14,023 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3828226.6666666665, ans=0.0 2023-11-27 09:14:18,480 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3828226.6666666665, ans=0.125 2023-11-27 09:14:19,467 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3828226.6666666665, ans=0.125 2023-11-27 09:14:26,644 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 574250 2023-11-27 09:14:27,848 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3828293.3333333335, ans=0.0 2023-11-27 09:14:29,014 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3828293.3333333335, ans=0.125 2023-11-27 09:14:32,878 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.883e+01 9.030e+01 9.534e+01 1.010e+02 1.225e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-27 09:14:55,588 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 9150, loss[loss=0.0918, simple_loss=0.1209, pruned_loss=0.02163, audio_tagging_loss=0.009741, over 15428.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.09024, pruned_loss=0.01215, audio_tagging_loss=0.008384, over 3057820.72 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:15:01,419 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.93 vs. limit=22.5 2023-11-27 09:15:03,364 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3828493.3333333335, ans=0.125 2023-11-27 09:15:15,028 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3828560.0, ans=0.125 2023-11-27 09:15:22,654 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 574300 2023-11-27 09:15:40,384 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3828760.0, ans=0.125 2023-11-27 09:15:46,053 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.61 vs. limit=15.0 2023-11-27 09:15:51,859 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 9200, loss[loss=0.06361, simple_loss=0.09073, pruned_loss=0.01211, audio_tagging_loss=0.006133, over 14616.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.08985, pruned_loss=0.01201, audio_tagging_loss=0.008253, over 3058201.07 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:15:52,071 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3828826.6666666665, ans=0.125 2023-11-27 09:16:05,345 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3828893.3333333335, ans=0.125 2023-11-27 09:16:08,934 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.55 vs. limit=6.0 2023-11-27 09:16:13,398 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.35 vs. limit=22.5 2023-11-27 09:16:18,613 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 574350 2023-11-27 09:16:24,821 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.684e+01 9.042e+01 9.589e+01 1.020e+02 1.357e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-27 09:16:26,074 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3829026.6666666665, ans=0.125 2023-11-27 09:16:27,177 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3829026.6666666665, ans=0.0 2023-11-27 09:16:38,819 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3829093.3333333335, ans=0.125 2023-11-27 09:16:46,198 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3829160.0, ans=0.0 2023-11-27 09:16:47,642 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 9250, loss[loss=0.05156, simple_loss=0.06833, pruned_loss=0.008188, audio_tagging_loss=0.009203, over 16760.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08997, pruned_loss=0.01198, audio_tagging_loss=0.008351, over 3061046.93 frames. ], batch size: 65, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:16:47,904 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3829160.0, ans=0.5 2023-11-27 09:17:00,308 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.87 vs. limit=15.0 2023-11-27 09:17:01,134 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3829226.6666666665, ans=0.125 2023-11-27 09:17:01,152 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 09:17:06,419 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3829226.6666666665, ans=0.125 2023-11-27 09:17:14,294 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 574400 2023-11-27 09:17:23,547 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3829360.0, ans=0.1 2023-11-27 09:17:43,316 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 9300, loss[loss=0.06511, simple_loss=0.08378, pruned_loss=0.01301, audio_tagging_loss=0.01021, over 16178.00 frames. ], tot_loss[loss=0.06465, simple_loss=0.08883, pruned_loss=0.01179, audio_tagging_loss=0.008436, over 3052622.54 frames. ], batch size: 61, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:18:03,629 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3829560.0, ans=0.125 2023-11-27 09:18:09,924 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 574450 2023-11-27 09:18:10,406 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.16 vs. limit=15.0 2023-11-27 09:18:16,656 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.064e+01 9.342e+01 9.838e+01 1.063e+02 1.386e+02, threshold=1.968e+02, percent-clipped=0.0 2023-11-27 09:18:24,807 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3829693.3333333335, ans=0.2 2023-11-27 09:18:36,081 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.40 vs. limit=15.0 2023-11-27 09:18:38,921 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 9350, loss[loss=0.05986, simple_loss=0.08396, pruned_loss=0.008255, audio_tagging_loss=0.009629, over 15984.00 frames. ], tot_loss[loss=0.0648, simple_loss=0.08875, pruned_loss=0.01191, audio_tagging_loss=0.008519, over 3054089.00 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:18:47,056 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3829826.6666666665, ans=0.0 2023-11-27 09:18:54,529 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3829893.3333333335, ans=0.1 2023-11-27 09:19:03,690 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3829960.0, ans=0.125 2023-11-27 09:19:05,626 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 574500 2023-11-27 09:19:25,239 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3830093.3333333335, ans=0.125 2023-11-27 09:19:34,651 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 9400, loss[loss=0.05545, simple_loss=0.07107, pruned_loss=0.008417, audio_tagging_loss=0.0115, over 15085.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.08886, pruned_loss=0.01186, audio_tagging_loss=0.008619, over 3052320.08 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:19:42,910 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3830160.0, ans=0.0 2023-11-27 09:19:51,759 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3830226.6666666665, ans=0.015 2023-11-27 09:19:51,877 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 09:20:00,498 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3830293.3333333335, ans=0.0 2023-11-27 09:20:01,319 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 574550 2023-11-27 09:20:08,876 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3830360.0, ans=0.125 2023-11-27 09:20:09,696 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.781e+01 8.905e+01 9.680e+01 1.031e+02 1.220e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-27 09:20:14,150 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3830360.0, ans=0.1 2023-11-27 09:20:22,549 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3830426.6666666665, ans=0.07 2023-11-27 09:20:23,622 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3830426.6666666665, ans=0.1 2023-11-27 09:20:29,236 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 09:20:30,794 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 9450, loss[loss=0.08341, simple_loss=0.1201, pruned_loss=0.0167, audio_tagging_loss=0.006667, over 15429.00 frames. ], tot_loss[loss=0.06522, simple_loss=0.08888, pruned_loss=0.01207, audio_tagging_loss=0.008711, over 3052637.59 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:20:34,309 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3830493.3333333335, ans=0.2 2023-11-27 09:20:43,815 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3830560.0, ans=0.125 2023-11-27 09:20:57,548 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 574600 2023-11-27 09:21:09,680 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.34 vs. limit=12.0 2023-11-27 09:21:26,388 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 9500, loss[loss=0.08489, simple_loss=0.1171, pruned_loss=0.01817, audio_tagging_loss=0.008172, over 15848.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.0889, pruned_loss=0.01204, audio_tagging_loss=0.008801, over 3050101.91 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:21:28,207 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3830826.6666666665, ans=0.125 2023-11-27 09:21:31,759 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.90 vs. limit=12.0 2023-11-27 09:21:48,658 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3830960.0, ans=0.07 2023-11-27 09:21:49,659 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3830960.0, ans=0.1 2023-11-27 09:21:53,249 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 574650 2023-11-27 09:22:01,092 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.673e+01 9.168e+01 9.748e+01 1.058e+02 1.599e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-27 09:22:02,368 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3831026.6666666665, ans=0.0 2023-11-27 09:22:13,412 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3831093.3333333335, ans=0.125 2023-11-27 09:22:16,322 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.06 vs. limit=15.0 2023-11-27 09:22:20,156 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3831093.3333333335, ans=0.1 2023-11-27 09:22:22,116 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 9550, loss[loss=0.08782, simple_loss=0.125, pruned_loss=0.01739, audio_tagging_loss=0.007937, over 15965.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.08935, pruned_loss=0.01201, audio_tagging_loss=0.008888, over 3049738.74 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:22:25,880 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.03 vs. limit=15.0 2023-11-27 09:22:38,635 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.06 vs. limit=22.5 2023-11-27 09:22:39,290 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3831226.6666666665, ans=0.1 2023-11-27 09:22:43,347 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3831293.3333333335, ans=0.125 2023-11-27 09:22:43,930 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.33 vs. limit=15.0 2023-11-27 09:22:48,527 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 574700 2023-11-27 09:22:54,379 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3831360.0, ans=0.125 2023-11-27 09:23:16,926 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 9600, loss[loss=0.06115, simple_loss=0.07855, pruned_loss=0.01427, audio_tagging_loss=0.007607, over 13968.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08881, pruned_loss=0.0119, audio_tagging_loss=0.008933, over 3053688.96 frames. ], batch size: 53, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:23:30,304 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3831560.0, ans=0.2 2023-11-27 09:23:35,098 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 09:23:40,228 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.77 vs. limit=15.0 2023-11-27 09:23:44,157 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 574750 2023-11-27 09:23:48,718 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3831626.6666666665, ans=0.1 2023-11-27 09:23:51,671 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.142e+01 9.086e+01 9.692e+01 1.047e+02 1.227e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-27 09:24:10,954 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3831760.0, ans=0.0 2023-11-27 09:24:12,899 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 9650, loss[loss=0.08027, simple_loss=0.1214, pruned_loss=0.01219, audio_tagging_loss=0.007396, over 16525.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08973, pruned_loss=0.01201, audio_tagging_loss=0.008815, over 3048578.27 frames. ], batch size: 60, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:24:20,495 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.64 vs. limit=22.5 2023-11-27 09:24:39,570 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 574800 2023-11-27 09:24:47,340 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.91 vs. limit=6.0 2023-11-27 09:25:09,692 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 9700, loss[loss=0.06924, simple_loss=0.09685, pruned_loss=0.01213, audio_tagging_loss=0.00869, over 14620.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08978, pruned_loss=0.01188, audio_tagging_loss=0.008656, over 3047909.47 frames. ], batch size: 53, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:25:09,836 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3832160.0, ans=0.125 2023-11-27 09:25:22,914 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3832226.6666666665, ans=0.125 2023-11-27 09:25:24,956 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3832226.6666666665, ans=0.0 2023-11-27 09:25:36,483 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 574850 2023-11-27 09:25:44,493 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.839e+01 9.066e+01 9.770e+01 1.059e+02 1.296e+02, threshold=1.954e+02, percent-clipped=0.0 2023-11-27 09:25:45,727 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3832360.0, ans=0.125 2023-11-27 09:26:05,304 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 9750, loss[loss=0.07989, simple_loss=0.09663, pruned_loss=0.02139, audio_tagging_loss=0.01018, over 15322.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.0904, pruned_loss=0.01198, audio_tagging_loss=0.008571, over 3049499.66 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:26:11,615 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.58 vs. limit=15.0 2023-11-27 09:26:32,886 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 574900 2023-11-27 09:27:01,003 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 9800, loss[loss=0.06872, simple_loss=0.09352, pruned_loss=0.01411, audio_tagging_loss=0.007847, over 14873.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08966, pruned_loss=0.01198, audio_tagging_loss=0.008558, over 3049692.70 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:27:09,775 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3832826.6666666665, ans=0.0 2023-11-27 09:27:12,386 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3832893.3333333335, ans=0.125 2023-11-27 09:27:16,691 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3832893.3333333335, ans=0.125 2023-11-27 09:27:27,257 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 09:27:28,150 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 574950 2023-11-27 09:27:29,306 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3832960.0, ans=0.1 2023-11-27 09:27:32,516 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3832960.0, ans=0.1 2023-11-27 09:27:36,086 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.628e+01 9.044e+01 9.762e+01 1.048e+02 1.288e+02, threshold=1.952e+02, percent-clipped=0.0 2023-11-27 09:27:51,900 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 09:27:57,775 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 9850, loss[loss=0.06041, simple_loss=0.08213, pruned_loss=0.01072, audio_tagging_loss=0.008628, over 16216.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.09039, pruned_loss=0.01204, audio_tagging_loss=0.008516, over 3049396.64 frames. ], batch size: 60, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:28:10,751 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3833226.6666666665, ans=0.125 2023-11-27 09:28:11,781 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3833226.6666666665, ans=0.1 2023-11-27 09:28:22,990 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.54 vs. limit=22.5 2023-11-27 09:28:23,731 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 575000 2023-11-27 09:28:35,836 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3833360.0, ans=0.125 2023-11-27 09:28:39,755 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3833360.0, ans=0.125 2023-11-27 09:28:40,964 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3833360.0, ans=0.0 2023-11-27 09:28:42,292 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.80 vs. limit=22.5 2023-11-27 09:28:45,151 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3833426.6666666665, ans=0.0 2023-11-27 09:28:50,378 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3833426.6666666665, ans=0.0 2023-11-27 09:28:53,336 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 9900, loss[loss=0.05436, simple_loss=0.06582, pruned_loss=0.01106, audio_tagging_loss=0.01039, over 15790.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.09101, pruned_loss=0.01228, audio_tagging_loss=0.008468, over 3048507.17 frames. ], batch size: 64, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:28:54,526 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3833493.3333333335, ans=0.125 2023-11-27 09:28:54,981 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.21 vs. limit=15.0 2023-11-27 09:28:57,759 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3833493.3333333335, ans=0.125 2023-11-27 09:29:00,760 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.32 vs. limit=12.0 2023-11-27 09:29:21,162 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 575050 2023-11-27 09:29:21,743 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.18 vs. limit=15.0 2023-11-27 09:29:28,628 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.548e+01 9.072e+01 9.748e+01 1.058e+02 2.513e+02, threshold=1.950e+02, percent-clipped=1.0 2023-11-27 09:29:36,334 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3833693.3333333335, ans=0.2 2023-11-27 09:29:49,349 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 9950, loss[loss=0.06838, simple_loss=0.09442, pruned_loss=0.0133, audio_tagging_loss=0.007872, over 16510.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.0903, pruned_loss=0.012, audio_tagging_loss=0.008462, over 3049749.36 frames. ], batch size: 62, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:29:51,685 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3833826.6666666665, ans=0.07 2023-11-27 09:29:54,802 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3833826.6666666665, ans=0.125 2023-11-27 09:30:01,864 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3833893.3333333335, ans=0.05 2023-11-27 09:30:15,996 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 575100 2023-11-27 09:30:25,083 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3834026.6666666665, ans=0.125 2023-11-27 09:30:40,166 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3834093.3333333335, ans=0.2 2023-11-27 09:30:45,631 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 10000, loss[loss=0.07973, simple_loss=0.1182, pruned_loss=0.01437, audio_tagging_loss=0.006273, over 14980.00 frames. ], tot_loss[loss=0.0645, simple_loss=0.08864, pruned_loss=0.0117, audio_tagging_loss=0.008482, over 3047442.99 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:30:49,436 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.73 vs. limit=12.0 2023-11-27 09:30:52,175 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3834160.0, ans=0.2 2023-11-27 09:30:57,458 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3834226.6666666665, ans=0.0 2023-11-27 09:31:11,720 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 575150 2023-11-27 09:31:17,216 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3834360.0, ans=0.125 2023-11-27 09:31:21,703 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.471e+01 9.035e+01 9.520e+01 1.022e+02 1.313e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-27 09:31:22,985 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3834360.0, ans=0.07 2023-11-27 09:31:23,435 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.38 vs. limit=15.0 2023-11-27 09:31:39,961 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=3834493.3333333335, ans=10.0 2023-11-27 09:31:40,771 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 10050, loss[loss=0.05008, simple_loss=0.06865, pruned_loss=0.009211, audio_tagging_loss=0.006546, over 15691.00 frames. ], tot_loss[loss=0.06429, simple_loss=0.08866, pruned_loss=0.01152, audio_tagging_loss=0.008436, over 3046437.27 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:31:47,413 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3834493.3333333335, ans=0.1 2023-11-27 09:32:07,302 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 575200 2023-11-27 09:32:08,326 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.05 vs. limit=22.5 2023-11-27 09:32:18,189 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3834693.3333333335, ans=0.125 2023-11-27 09:32:21,790 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.47 vs. limit=15.0 2023-11-27 09:32:36,521 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 10100, loss[loss=0.0751, simple_loss=0.1072, pruned_loss=0.01306, audio_tagging_loss=0.008449, over 15256.00 frames. ], tot_loss[loss=0.06436, simple_loss=0.0888, pruned_loss=0.01148, audio_tagging_loss=0.008484, over 3047921.36 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:32:40,943 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3834826.6666666665, ans=0.0 2023-11-27 09:32:46,836 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3834893.3333333335, ans=0.0 2023-11-27 09:32:52,822 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3834893.3333333335, ans=0.125 2023-11-27 09:32:57,484 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.26 vs. limit=10.0 2023-11-27 09:33:03,269 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 575250 2023-11-27 09:33:04,390 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3834960.0, ans=0.125 2023-11-27 09:33:12,688 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.819e+01 8.969e+01 9.511e+01 1.051e+02 1.642e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-27 09:33:21,664 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 09:33:31,813 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 10150, loss[loss=0.0531, simple_loss=0.06541, pruned_loss=0.008515, audio_tagging_loss=0.01188, over 15303.00 frames. ], tot_loss[loss=0.06419, simple_loss=0.08842, pruned_loss=0.01137, audio_tagging_loss=0.0086, over 3047936.11 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:33:33,463 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.81 vs. limit=12.0 2023-11-27 09:33:35,608 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.26 vs. limit=15.0 2023-11-27 09:33:59,251 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 09:33:59,290 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 575300 2023-11-27 09:34:02,649 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3835293.3333333335, ans=0.1 2023-11-27 09:34:05,724 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3835360.0, ans=0.1 2023-11-27 09:34:15,752 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3835426.6666666665, ans=0.0 2023-11-27 09:34:28,272 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 10200, loss[loss=0.05805, simple_loss=0.07711, pruned_loss=0.009519, audio_tagging_loss=0.00998, over 13453.00 frames. ], tot_loss[loss=0.06419, simple_loss=0.08794, pruned_loss=0.01154, audio_tagging_loss=0.008685, over 3041385.59 frames. ], batch size: 51, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:34:48,930 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 09:34:54,830 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 575350 2023-11-27 09:35:01,189 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=3835693.3333333335, ans=22.5 2023-11-27 09:35:02,678 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3835693.3333333335, ans=0.125 2023-11-27 09:35:06,661 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.675e+01 9.149e+01 9.728e+01 1.044e+02 1.552e+02, threshold=1.946e+02, percent-clipped=0.0 2023-11-27 09:35:11,157 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3835693.3333333335, ans=0.0 2023-11-27 09:35:24,029 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 10250, loss[loss=0.05915, simple_loss=0.08299, pruned_loss=0.008948, audio_tagging_loss=0.008705, over 14328.00 frames. ], tot_loss[loss=0.06401, simple_loss=0.08764, pruned_loss=0.01147, audio_tagging_loss=0.00872, over 3049809.91 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 8.0 2023-11-27 09:35:24,332 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3835826.6666666665, ans=0.125 2023-11-27 09:35:28,746 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.94 vs. limit=15.0 2023-11-27 09:35:50,247 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3835960.0, ans=0.125 2023-11-27 09:35:51,208 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 575400 2023-11-27 09:35:53,771 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3835960.0, ans=0.125 2023-11-27 09:36:15,599 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3836093.3333333335, ans=0.0 2023-11-27 09:36:19,990 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 10300, loss[loss=0.06438, simple_loss=0.07312, pruned_loss=0.01415, audio_tagging_loss=0.01367, over 15521.00 frames. ], tot_loss[loss=0.06444, simple_loss=0.08815, pruned_loss=0.01167, audio_tagging_loss=0.008692, over 3054550.55 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 8.0 2023-11-27 09:36:36,953 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3836226.6666666665, ans=0.0 2023-11-27 09:36:46,346 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 575450 2023-11-27 09:36:52,809 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.26 vs. limit=10.0 2023-11-27 09:36:57,296 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.710e+01 8.883e+01 9.810e+01 1.061e+02 1.854e+02, threshold=1.962e+02, percent-clipped=0.0 2023-11-27 09:37:16,036 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 10350, loss[loss=0.07887, simple_loss=0.1089, pruned_loss=0.01624, audio_tagging_loss=0.008175, over 14928.00 frames. ], tot_loss[loss=0.0652, simple_loss=0.08939, pruned_loss=0.01172, audio_tagging_loss=0.008782, over 3057144.10 frames. ], batch size: 53, lr: 1.40e-03, grad_scale: 8.0 2023-11-27 09:37:22,668 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3836493.3333333335, ans=0.125 2023-11-27 09:37:28,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3836560.0, ans=0.1 2023-11-27 09:37:42,158 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 575500 2023-11-27 09:37:56,175 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3836693.3333333335, ans=0.1 2023-11-27 09:38:11,114 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 10400, loss[loss=0.08255, simple_loss=0.1194, pruned_loss=0.01638, audio_tagging_loss=0.006489, over 14968.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08972, pruned_loss=0.01185, audio_tagging_loss=0.008813, over 3054553.59 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:38:23,499 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3836893.3333333335, ans=0.125 2023-11-27 09:38:24,841 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=3836893.3333333335, ans=15.0 2023-11-27 09:38:38,256 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 575550 2023-11-27 09:38:48,560 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3837026.6666666665, ans=0.1 2023-11-27 09:38:49,401 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.735e+01 9.167e+01 9.734e+01 1.047e+02 2.020e+02, threshold=1.947e+02, percent-clipped=1.0 2023-11-27 09:38:53,822 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3837026.6666666665, ans=0.0 2023-11-27 09:39:06,811 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 10450, loss[loss=0.08465, simple_loss=0.1205, pruned_loss=0.01711, audio_tagging_loss=0.00728, over 15237.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08949, pruned_loss=0.01181, audio_tagging_loss=0.008787, over 3046998.51 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:39:15,448 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 09:39:27,584 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3837226.6666666665, ans=0.125 2023-11-27 09:39:33,886 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 575600 2023-11-27 09:39:37,907 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=3837293.3333333335, ans=0.05 2023-11-27 09:40:03,168 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 10500, loss[loss=0.05174, simple_loss=0.06494, pruned_loss=0.009203, audio_tagging_loss=0.01007, over 14980.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.08935, pruned_loss=0.01182, audio_tagging_loss=0.008667, over 3050945.36 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:40:17,855 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3837560.0, ans=0.1 2023-11-27 09:40:21,547 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3837560.0, ans=0.125 2023-11-27 09:40:29,785 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 575650 2023-11-27 09:40:32,686 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3837626.6666666665, ans=0.1 2023-11-27 09:40:41,393 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.578e+01 8.939e+01 9.707e+01 1.019e+02 1.510e+02, threshold=1.941e+02, percent-clipped=0.0 2023-11-27 09:40:58,855 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 10550, loss[loss=0.05801, simple_loss=0.07401, pruned_loss=0.0113, audio_tagging_loss=0.009703, over 16346.00 frames. ], tot_loss[loss=0.0647, simple_loss=0.08884, pruned_loss=0.01175, audio_tagging_loss=0.008533, over 3046375.14 frames. ], batch size: 64, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:41:05,924 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-27 09:41:24,012 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.30 vs. limit=10.0 2023-11-27 09:41:25,811 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 575700 2023-11-27 09:41:36,965 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3838026.6666666665, ans=0.1 2023-11-27 09:41:38,228 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.04 vs. limit=12.0 2023-11-27 09:41:41,068 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3838026.6666666665, ans=0.0 2023-11-27 09:41:54,083 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 10600, loss[loss=0.08231, simple_loss=0.1179, pruned_loss=0.01513, audio_tagging_loss=0.008253, over 14364.00 frames. ], tot_loss[loss=0.06442, simple_loss=0.08841, pruned_loss=0.01172, audio_tagging_loss=0.008495, over 3046432.01 frames. ], batch size: 53, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:42:06,735 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3838226.6666666665, ans=0.2 2023-11-27 09:42:06,815 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3838226.6666666665, ans=0.0 2023-11-27 09:42:20,804 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 575750 2023-11-27 09:42:31,645 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.836e+01 9.017e+01 9.530e+01 1.017e+02 1.584e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-27 09:42:44,058 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3838426.6666666665, ans=0.0 2023-11-27 09:42:45,700 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3838426.6666666665, ans=0.07 2023-11-27 09:42:49,580 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 10650, loss[loss=0.08214, simple_loss=0.1191, pruned_loss=0.01592, audio_tagging_loss=0.00667, over 14751.00 frames. ], tot_loss[loss=0.06499, simple_loss=0.08914, pruned_loss=0.01196, audio_tagging_loss=0.00846, over 3050519.42 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:43:15,912 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 575800 2023-11-27 09:43:24,157 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3838693.3333333335, ans=0.125 2023-11-27 09:43:30,213 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.37 vs. limit=10.0 2023-11-27 09:43:36,169 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3838760.0, ans=0.1 2023-11-27 09:43:44,271 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 10700, loss[loss=0.07022, simple_loss=0.1018, pruned_loss=0.01222, audio_tagging_loss=0.007103, over 15660.00 frames. ], tot_loss[loss=0.06471, simple_loss=0.08863, pruned_loss=0.01192, audio_tagging_loss=0.008479, over 3047933.73 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:43:44,419 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3838826.6666666665, ans=0.0 2023-11-27 09:43:45,955 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3838826.6666666665, ans=0.125 2023-11-27 09:43:48,152 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3838826.6666666665, ans=0.125 2023-11-27 09:44:11,415 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 575850 2023-11-27 09:44:21,921 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.744e+01 9.303e+01 9.868e+01 1.046e+02 1.253e+02, threshold=1.974e+02, percent-clipped=0.0 2023-11-27 09:44:39,734 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 10750, loss[loss=0.06154, simple_loss=0.08221, pruned_loss=0.01149, audio_tagging_loss=0.00894, over 15272.00 frames. ], tot_loss[loss=0.06476, simple_loss=0.08898, pruned_loss=0.01189, audio_tagging_loss=0.008383, over 3045690.80 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:44:45,854 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.19 vs. limit=6.0 2023-11-27 09:44:49,720 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3839226.6666666665, ans=0.125 2023-11-27 09:45:01,027 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.95 vs. limit=15.0 2023-11-27 09:45:05,853 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 575900 2023-11-27 09:45:13,916 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3839360.0, ans=0.125 2023-11-27 09:45:14,865 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 09:45:16,946 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3839360.0, ans=0.125 2023-11-27 09:45:26,067 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.24 vs. limit=15.0 2023-11-27 09:45:30,079 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.64 vs. limit=10.0 2023-11-27 09:45:34,363 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 10800, loss[loss=0.07088, simple_loss=0.09664, pruned_loss=0.01713, audio_tagging_loss=0.005429, over 16324.00 frames. ], tot_loss[loss=0.06464, simple_loss=0.08898, pruned_loss=0.01183, audio_tagging_loss=0.008324, over 3048837.29 frames. ], batch size: 60, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:45:39,739 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3839493.3333333335, ans=0.125 2023-11-27 09:45:42,880 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3839493.3333333335, ans=0.1 2023-11-27 09:46:00,512 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 575950 2023-11-27 09:46:09,381 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3839693.3333333335, ans=0.2 2023-11-27 09:46:11,269 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.948e+01 8.958e+01 9.647e+01 1.051e+02 1.313e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-27 09:46:28,738 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 10850, loss[loss=0.07564, simple_loss=0.1104, pruned_loss=0.01261, audio_tagging_loss=0.007848, over 15321.00 frames. ], tot_loss[loss=0.06465, simple_loss=0.08874, pruned_loss=0.01185, audio_tagging_loss=0.008425, over 3042947.61 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:46:34,232 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 09:46:55,395 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 576000 2023-11-27 09:46:57,192 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-576000.pt 2023-11-27 09:47:22,922 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 09:47:25,623 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.84 vs. limit=6.0 2023-11-27 09:47:26,023 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 10900, loss[loss=0.06775, simple_loss=0.09054, pruned_loss=0.0151, audio_tagging_loss=0.007377, over 15016.00 frames. ], tot_loss[loss=0.06488, simple_loss=0.089, pruned_loss=0.0119, audio_tagging_loss=0.008477, over 3046420.08 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:47:30,520 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3840160.0, ans=0.05 2023-11-27 09:47:39,908 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3840226.6666666665, ans=0.1 2023-11-27 09:47:52,298 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 576050 2023-11-27 09:48:00,658 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3840360.0, ans=0.04949747468305833 2023-11-27 09:48:03,055 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.060e+01 9.173e+01 9.584e+01 1.016e+02 1.234e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-27 09:48:10,363 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=3840426.6666666665, ans=10.0 2023-11-27 09:48:17,876 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3840426.6666666665, ans=0.125 2023-11-27 09:48:21,470 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 10950, loss[loss=0.05664, simple_loss=0.08317, pruned_loss=0.006754, audio_tagging_loss=0.008301, over 15096.00 frames. ], tot_loss[loss=0.06429, simple_loss=0.08801, pruned_loss=0.01176, audio_tagging_loss=0.008533, over 3051844.78 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:48:25,208 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.10 vs. limit=15.0 2023-11-27 09:48:46,916 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3840626.6666666665, ans=0.125 2023-11-27 09:48:47,709 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 576100 2023-11-27 09:48:48,807 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3840626.6666666665, ans=0.0 2023-11-27 09:48:55,407 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.29 vs. limit=15.0 2023-11-27 09:49:11,990 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3840760.0, ans=0.0 2023-11-27 09:49:12,374 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.50 vs. limit=22.5 2023-11-27 09:49:15,880 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 11000, loss[loss=0.06634, simple_loss=0.09094, pruned_loss=0.01247, audio_tagging_loss=0.0084, over 14265.00 frames. ], tot_loss[loss=0.06418, simple_loss=0.0879, pruned_loss=0.01166, audio_tagging_loss=0.008576, over 3048351.01 frames. ], batch size: 52, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:49:19,260 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3840826.6666666665, ans=0.0 2023-11-27 09:49:24,277 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 09:49:33,616 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3840893.3333333335, ans=0.04949747468305833 2023-11-27 09:49:42,382 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 576150 2023-11-27 09:49:45,247 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3840960.0, ans=0.125 2023-11-27 09:49:47,160 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3840960.0, ans=0.1 2023-11-27 09:49:51,349 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3841026.6666666665, ans=0.2 2023-11-27 09:49:53,300 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.866e+01 8.909e+01 9.397e+01 1.014e+02 1.657e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-27 09:49:54,596 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3841026.6666666665, ans=0.09899494936611666 2023-11-27 09:50:04,896 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3841093.3333333335, ans=0.1 2023-11-27 09:50:05,153 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.49 vs. limit=10.0 2023-11-27 09:50:10,561 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 11050, loss[loss=0.08094, simple_loss=0.1175, pruned_loss=0.01371, audio_tagging_loss=0.008476, over 16943.00 frames. ], tot_loss[loss=0.06436, simple_loss=0.08805, pruned_loss=0.01159, audio_tagging_loss=0.008742, over 3047816.91 frames. ], batch size: 61, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:50:12,622 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.53 vs. limit=6.0 2023-11-27 09:50:16,584 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.36 vs. limit=15.0 2023-11-27 09:50:33,592 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.75 vs. limit=15.0 2023-11-27 09:50:36,363 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3841293.3333333335, ans=0.04949747468305833 2023-11-27 09:50:37,179 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 576200 2023-11-27 09:51:03,023 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3841426.6666666665, ans=0.0 2023-11-27 09:51:05,940 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 11100, loss[loss=0.06552, simple_loss=0.09348, pruned_loss=0.01008, audio_tagging_loss=0.008691, over 14235.00 frames. ], tot_loss[loss=0.06499, simple_loss=0.08874, pruned_loss=0.01186, audio_tagging_loss=0.008763, over 3043790.21 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:51:12,975 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3841493.3333333335, ans=0.1 2023-11-27 09:51:17,015 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3841560.0, ans=0.125 2023-11-27 09:51:17,953 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3841560.0, ans=0.1 2023-11-27 09:51:24,971 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.82 vs. limit=22.5 2023-11-27 09:51:31,163 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3841626.6666666665, ans=0.125 2023-11-27 09:51:32,091 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 576250 2023-11-27 09:51:44,484 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.822e+01 9.157e+01 9.860e+01 1.054e+02 1.486e+02, threshold=1.972e+02, percent-clipped=0.0 2023-11-27 09:51:55,748 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3841760.0, ans=0.125 2023-11-27 09:51:55,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3841760.0, ans=0.125 2023-11-27 09:52:00,848 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 11150, loss[loss=0.06149, simple_loss=0.08854, pruned_loss=0.009409, audio_tagging_loss=0.007814, over 14936.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.08878, pruned_loss=0.01196, audio_tagging_loss=0.008839, over 3044756.15 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:52:11,479 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3841893.3333333335, ans=0.0 2023-11-27 09:52:26,367 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3841960.0, ans=0.125 2023-11-27 09:52:27,402 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 576300 2023-11-27 09:52:55,664 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 11200, loss[loss=0.08821, simple_loss=0.1296, pruned_loss=0.01653, audio_tagging_loss=0.006868, over 14796.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08916, pruned_loss=0.01194, audio_tagging_loss=0.008861, over 3034557.90 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:53:22,311 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 576350 2023-11-27 09:53:33,755 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.649e+01 9.019e+01 9.427e+01 1.023e+02 1.335e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-27 09:53:38,900 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3842426.6666666665, ans=0.125 2023-11-27 09:53:50,655 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 11250, loss[loss=0.06966, simple_loss=0.097, pruned_loss=0.0138, audio_tagging_loss=0.007356, over 15770.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08946, pruned_loss=0.01198, audio_tagging_loss=0.008766, over 3038482.63 frames. ], batch size: 60, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:53:54,005 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3842493.3333333335, ans=0.125 2023-11-27 09:54:00,892 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3842560.0, ans=0.0 2023-11-27 09:54:16,337 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 576400 2023-11-27 09:54:23,504 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3842693.3333333335, ans=10.0 2023-11-27 09:54:32,190 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.86 vs. limit=15.0 2023-11-27 09:54:34,044 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3842760.0, ans=0.125 2023-11-27 09:54:44,034 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3842760.0, ans=0.04949747468305833 2023-11-27 09:54:45,845 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 11300, loss[loss=0.06565, simple_loss=0.08682, pruned_loss=0.01309, audio_tagging_loss=0.009149, over 14195.00 frames. ], tot_loss[loss=0.06476, simple_loss=0.08875, pruned_loss=0.01179, audio_tagging_loss=0.008598, over 3034133.04 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:55:09,467 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3842960.0, ans=0.125 2023-11-27 09:55:11,922 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 576450 2023-11-27 09:55:12,047 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3842960.0, ans=0.125 2023-11-27 09:55:17,816 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3843026.6666666665, ans=0.2 2023-11-27 09:55:19,267 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3843026.6666666665, ans=0.0 2023-11-27 09:55:19,332 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-27 09:55:24,856 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.07 vs. limit=6.0 2023-11-27 09:55:25,402 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.239e+01 9.044e+01 9.676e+01 1.062e+02 1.427e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-27 09:55:27,674 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3843026.6666666665, ans=0.125 2023-11-27 09:55:40,072 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 11350, loss[loss=0.05461, simple_loss=0.07314, pruned_loss=0.008639, audio_tagging_loss=0.009404, over 15584.00 frames. ], tot_loss[loss=0.06412, simple_loss=0.08815, pruned_loss=0.01164, audio_tagging_loss=0.0084, over 3036885.93 frames. ], batch size: 61, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:55:46,495 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.07 vs. limit=15.0 2023-11-27 09:55:51,399 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3843226.6666666665, ans=0.0 2023-11-27 09:56:07,288 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 576500 2023-11-27 09:56:08,527 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3843293.3333333335, ans=0.0 2023-11-27 09:56:11,562 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3843293.3333333335, ans=0.125 2023-11-27 09:56:12,714 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3843360.0, ans=0.0 2023-11-27 09:56:19,883 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3843360.0, ans=0.125 2023-11-27 09:56:35,307 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 11400, loss[loss=0.06743, simple_loss=0.09901, pruned_loss=0.01273, audio_tagging_loss=0.005194, over 15104.00 frames. ], tot_loss[loss=0.06482, simple_loss=0.08937, pruned_loss=0.01182, audio_tagging_loss=0.008322, over 3035450.82 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:56:54,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3843560.0, ans=0.2 2023-11-27 09:57:00,537 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3843626.6666666665, ans=0.125 2023-11-27 09:57:01,490 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 576550 2023-11-27 09:57:02,699 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3843626.6666666665, ans=0.125 2023-11-27 09:57:06,272 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3843626.6666666665, ans=0.2 2023-11-27 09:57:06,628 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.31 vs. limit=12.0 2023-11-27 09:57:15,548 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.588e+01 9.108e+01 9.687e+01 1.045e+02 1.288e+02, threshold=1.937e+02, percent-clipped=0.0 2023-11-27 09:57:15,816 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3843693.3333333335, ans=0.125 2023-11-27 09:57:28,011 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.30 vs. limit=15.0 2023-11-27 09:57:29,912 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3843826.6666666665, ans=0.125 2023-11-27 09:57:30,704 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 11450, loss[loss=0.06585, simple_loss=0.0907, pruned_loss=0.01106, audio_tagging_loss=0.009446, over 14579.00 frames. ], tot_loss[loss=0.06466, simple_loss=0.08906, pruned_loss=0.0118, audio_tagging_loss=0.008331, over 3043968.13 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 8.0 2023-11-27 09:57:30,898 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3843826.6666666665, ans=0.1 2023-11-27 09:57:34,159 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3843826.6666666665, ans=0.125 2023-11-27 09:57:40,293 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3843893.3333333335, ans=0.1 2023-11-27 09:57:41,703 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.38 vs. limit=15.0 2023-11-27 09:57:42,454 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 09:57:48,998 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3843893.3333333335, ans=0.0 2023-11-27 09:57:56,236 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 576600 2023-11-27 09:58:03,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3844026.6666666665, ans=0.125 2023-11-27 09:58:25,203 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 11500, loss[loss=0.05259, simple_loss=0.05944, pruned_loss=0.007533, audio_tagging_loss=0.01534, over 14771.00 frames. ], tot_loss[loss=0.06507, simple_loss=0.08934, pruned_loss=0.012, audio_tagging_loss=0.008407, over 3040746.79 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 8.0 2023-11-27 09:58:33,515 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.05 vs. limit=22.5 2023-11-27 09:58:35,210 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 09:58:39,519 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3844226.6666666665, ans=0.125 2023-11-27 09:58:47,804 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3844293.3333333335, ans=0.1 2023-11-27 09:58:52,266 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 576650 2023-11-27 09:58:57,696 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3844360.0, ans=0.0 2023-11-27 09:59:05,732 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.980e+01 9.059e+01 9.598e+01 1.033e+02 1.422e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-27 09:59:13,719 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 09:59:19,830 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 11550, loss[loss=0.06876, simple_loss=0.09039, pruned_loss=0.01342, audio_tagging_loss=0.01015, over 15518.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08995, pruned_loss=0.01206, audio_tagging_loss=0.008329, over 3050097.66 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 8.0 2023-11-27 09:59:25,848 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3844493.3333333335, ans=0.0 2023-11-27 09:59:29,498 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3844493.3333333335, ans=0.125 2023-11-27 09:59:46,682 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 576700 2023-11-27 09:59:54,492 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 10:00:15,321 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 11600, loss[loss=0.06817, simple_loss=0.09056, pruned_loss=0.01364, audio_tagging_loss=0.009244, over 15466.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.0902, pruned_loss=0.0121, audio_tagging_loss=0.008399, over 3052918.50 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 10:00:31,939 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.53 vs. limit=15.0 2023-11-27 10:00:41,407 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 576750 2023-11-27 10:00:43,651 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3844960.0, ans=0.0 2023-11-27 10:00:49,359 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3845026.6666666665, ans=0.125 2023-11-27 10:00:50,372 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3845026.6666666665, ans=0.125 2023-11-27 10:00:55,893 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.192e+01 9.083e+01 9.816e+01 1.051e+02 1.317e+02, threshold=1.963e+02, percent-clipped=0.0 2023-11-27 10:01:09,979 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 11650, loss[loss=0.08011, simple_loss=0.1124, pruned_loss=0.01747, audio_tagging_loss=0.006457, over 15416.00 frames. ], tot_loss[loss=0.06487, simple_loss=0.08907, pruned_loss=0.01188, audio_tagging_loss=0.008453, over 3053928.52 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 10:01:36,728 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 576800 2023-11-27 10:01:41,342 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3845293.3333333335, ans=0.1 2023-11-27 10:01:41,385 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 10:01:52,904 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.39 vs. limit=15.0 2023-11-27 10:01:57,894 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.63 vs. limit=10.0 2023-11-27 10:02:05,331 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 11700, loss[loss=0.07232, simple_loss=0.1002, pruned_loss=0.01312, audio_tagging_loss=0.009108, over 14753.00 frames. ], tot_loss[loss=0.06461, simple_loss=0.08847, pruned_loss=0.01179, audio_tagging_loss=0.008587, over 3045294.50 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 10:02:13,267 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 10:02:15,998 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3845560.0, ans=0.0 2023-11-27 10:02:20,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3845560.0, ans=0.2 2023-11-27 10:02:31,906 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 576850 2023-11-27 10:02:45,923 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.976e+01 8.983e+01 9.560e+01 1.031e+02 1.339e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-27 10:02:59,749 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3845826.6666666665, ans=0.0 2023-11-27 10:03:00,654 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 11750, loss[loss=0.06477, simple_loss=0.08274, pruned_loss=0.01618, audio_tagging_loss=0.00722, over 14940.00 frames. ], tot_loss[loss=0.06439, simple_loss=0.0881, pruned_loss=0.01174, audio_tagging_loss=0.008599, over 3045969.38 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 10:03:08,787 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3845826.6666666665, ans=0.1 2023-11-27 10:03:13,991 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 10:03:24,193 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.00 vs. limit=10.0 2023-11-27 10:03:25,020 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3845960.0, ans=0.1 2023-11-27 10:03:26,923 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 576900 2023-11-27 10:03:28,437 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.38 vs. limit=6.0 2023-11-27 10:03:39,638 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3846026.6666666665, ans=0.125 2023-11-27 10:03:44,907 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3846093.3333333335, ans=0.0 2023-11-27 10:03:50,119 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3846093.3333333335, ans=0.07 2023-11-27 10:03:54,031 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.54 vs. limit=12.0 2023-11-27 10:03:54,795 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3846160.0, ans=0.2 2023-11-27 10:03:55,766 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 11800, loss[loss=0.07796, simple_loss=0.1151, pruned_loss=0.01345, audio_tagging_loss=0.006936, over 15876.00 frames. ], tot_loss[loss=0.06475, simple_loss=0.08851, pruned_loss=0.01189, audio_tagging_loss=0.008604, over 3049193.84 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 10:03:58,005 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3846160.0, ans=0.125 2023-11-27 10:04:02,243 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3846160.0, ans=0.125 2023-11-27 10:04:03,858 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.79 vs. limit=22.5 2023-11-27 10:04:22,307 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 576950 2023-11-27 10:04:22,462 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3846293.3333333335, ans=0.125 2023-11-27 10:04:30,958 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 10:04:34,565 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3846360.0, ans=0.95 2023-11-27 10:04:35,585 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3846360.0, ans=0.07 2023-11-27 10:04:36,324 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.535e+01 9.212e+01 9.788e+01 1.058e+02 1.368e+02, threshold=1.958e+02, percent-clipped=0.0 2023-11-27 10:04:50,442 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 11850, loss[loss=0.06319, simple_loss=0.08399, pruned_loss=0.01381, audio_tagging_loss=0.007379, over 14452.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08907, pruned_loss=0.01192, audio_tagging_loss=0.008657, over 3048268.31 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 10:05:05,204 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3846560.0, ans=0.0 2023-11-27 10:05:09,268 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3846560.0, ans=0.0 2023-11-27 10:05:17,049 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 577000 2023-11-27 10:05:46,155 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 11900, loss[loss=0.06578, simple_loss=0.09197, pruned_loss=0.01084, audio_tagging_loss=0.008947, over 15382.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08908, pruned_loss=0.01189, audio_tagging_loss=0.008708, over 3041213.68 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 10:05:46,317 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3846826.6666666665, ans=0.025 2023-11-27 10:05:52,512 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3846826.6666666665, ans=0.125 2023-11-27 10:05:54,684 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3846826.6666666665, ans=0.2 2023-11-27 10:06:12,356 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 577050 2023-11-27 10:06:13,656 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3846960.0, ans=0.0 2023-11-27 10:06:24,467 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3847026.6666666665, ans=0.125 2023-11-27 10:06:26,328 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.851e+01 8.935e+01 9.684e+01 1.048e+02 1.256e+02, threshold=1.937e+02, percent-clipped=0.0 2023-11-27 10:06:40,534 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 11950, loss[loss=0.04033, simple_loss=0.0491, pruned_loss=0.005433, audio_tagging_loss=0.01034, over 15336.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08971, pruned_loss=0.01204, audio_tagging_loss=0.008685, over 3041740.08 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 10:06:57,817 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.51 vs. limit=15.0 2023-11-27 10:07:07,315 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 577100 2023-11-27 10:07:16,113 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3847360.0, ans=0.0 2023-11-27 10:07:20,176 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3847360.0, ans=0.0 2023-11-27 10:07:24,044 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3847426.6666666665, ans=0.0 2023-11-27 10:07:25,105 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3847426.6666666665, ans=0.0 2023-11-27 10:07:33,996 INFO [train_asr.py:1235] (0/4) Epoch 48, batch 12000, loss[loss=0.08676, simple_loss=0.1202, pruned_loss=0.0191, audio_tagging_loss=0.007541, over 15617.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.08987, pruned_loss=0.01206, audio_tagging_loss=0.008726, over 3043312.62 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 10:07:33,999 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-27 10:07:55,051 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.2342, 4.0629, 3.7872, 3.3681], device='cuda:0') 2023-11-27 10:08:06,007 INFO [train_asr.py:1267] (0/4) Epoch 48, validation: loss=0.05797, simple_loss=0.05046, pruned_loss=0.005369, audio_tagging_loss=0.02737, over 4681554.00 frames. 2023-11-27 10:08:06,008 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-27 10:08:08,092 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3847493.3333333335, ans=0.0 2023-11-27 10:08:08,123 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3847493.3333333335, ans=0.0 2023-11-27 10:08:09,172 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3847493.3333333335, ans=0.125 2023-11-27 10:08:31,342 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/epoch-48.pt 2023-11-27 10:08:58,286 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 0, loss[loss=0.07259, simple_loss=0.0833, pruned_loss=0.01047, audio_tagging_loss=0.02047, over 14436.00 frames. ], tot_loss[loss=0.07259, simple_loss=0.0833, pruned_loss=0.01047, audio_tagging_loss=0.02047, over 14436.00 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 32.0 2023-11-27 10:08:58,287 INFO [train_asr.py:1258] (0/4) Computing validation loss 2023-11-27 10:09:13,020 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([6.3807, 6.0199, 6.3133, 5.7303], device='cuda:0') 2023-11-27 10:09:21,871 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.3524, 4.3253, 4.4800, 4.4898], device='cuda:0') 2023-11-27 10:09:25,594 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.8222, 4.9629, 5.0919, 4.9144], device='cuda:0') 2023-11-27 10:09:29,263 INFO [train_asr.py:1267] (0/4) Epoch 49, validation: loss=0.05781, simple_loss=0.05038, pruned_loss=0.005301, audio_tagging_loss=0.02732, over 4681554.00 frames. 2023-11-27 10:09:29,264 INFO [train_asr.py:1268] (0/4) Maximum memory allocated so far is 25978MB 2023-11-27 10:09:29,326 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 577150 2023-11-27 10:09:29,443 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3847653.3333333335, ans=0.125 2023-11-27 10:09:42,104 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3847720.0, ans=0.125 2023-11-27 10:09:42,844 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.973e+01 9.407e+01 1.008e+02 1.108e+02 1.423e+02, threshold=2.015e+02, percent-clipped=0.0 2023-11-27 10:10:21,741 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3847920.0, ans=0.125 2023-11-27 10:10:23,728 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 50, loss[loss=0.06131, simple_loss=0.06827, pruned_loss=0.00706, audio_tagging_loss=0.02011, over 14762.00 frames. ], tot_loss[loss=0.07008, simple_loss=0.08496, pruned_loss=0.01045, audio_tagging_loss=0.01716, over 685306.90 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 32.0 2023-11-27 10:10:23,792 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 577200 2023-11-27 10:10:30,003 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3847986.6666666665, ans=0.1 2023-11-27 10:10:31,066 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3847986.6666666665, ans=0.125 2023-11-27 10:10:37,326 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.97 vs. limit=10.0 2023-11-27 10:11:08,230 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3848253.3333333335, ans=0.125 2023-11-27 10:11:13,016 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3848253.3333333335, ans=0.1 2023-11-27 10:11:17,613 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3848253.3333333335, ans=0.2 2023-11-27 10:11:19,480 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 100, loss[loss=0.07656, simple_loss=0.1003, pruned_loss=0.01247, audio_tagging_loss=0.01394, over 15497.00 frames. ], tot_loss[loss=0.07117, simple_loss=0.08731, pruned_loss=0.01132, audio_tagging_loss=0.0162, over 1210083.55 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 32.0 2023-11-27 10:11:19,549 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 577250 2023-11-27 10:11:23,375 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3848320.0, ans=0.0 2023-11-27 10:11:34,070 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.548e+01 9.835e+01 1.039e+02 1.086e+02 1.551e+02, threshold=2.079e+02, percent-clipped=0.0 2023-11-27 10:11:46,846 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3848453.3333333335, ans=0.0 2023-11-27 10:11:58,195 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3848520.0, ans=0.125 2023-11-27 10:12:09,159 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 10:12:12,176 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.65 vs. limit=22.5 2023-11-27 10:12:14,292 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.20 vs. limit=12.0 2023-11-27 10:12:14,679 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 150, loss[loss=0.06263, simple_loss=0.08693, pruned_loss=0.009691, audio_tagging_loss=0.009474, over 14994.00 frames. ], tot_loss[loss=0.06954, simple_loss=0.08765, pruned_loss=0.01134, audio_tagging_loss=0.01438, over 1617716.22 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:12:14,744 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 577300 2023-11-27 10:12:17,461 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.38 vs. limit=22.5 2023-11-27 10:12:23,243 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3848653.3333333335, ans=0.125 2023-11-27 10:12:53,714 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3848853.3333333335, ans=0.2 2023-11-27 10:13:03,378 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.69 vs. limit=15.0 2023-11-27 10:13:09,314 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 200, loss[loss=0.05647, simple_loss=0.07731, pruned_loss=0.008015, audio_tagging_loss=0.009803, over 15122.00 frames. ], tot_loss[loss=0.06834, simple_loss=0.08812, pruned_loss=0.01158, audio_tagging_loss=0.0127, over 1933531.36 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:13:09,401 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 577350 2023-11-27 10:13:22,187 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3849053.3333333335, ans=0.125 2023-11-27 10:13:23,290 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3849053.3333333335, ans=0.125 2023-11-27 10:13:25,067 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.806e+01 9.137e+01 9.838e+01 1.045e+02 1.312e+02, threshold=1.968e+02, percent-clipped=0.0 2023-11-27 10:13:33,205 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3849120.0, ans=0.125 2023-11-27 10:13:41,961 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3849186.6666666665, ans=0.125 2023-11-27 10:13:45,096 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3849186.6666666665, ans=0.125 2023-11-27 10:13:47,134 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3849186.6666666665, ans=0.1 2023-11-27 10:13:51,000 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.22 vs. limit=5.0 2023-11-27 10:13:58,152 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.72 vs. limit=10.0 2023-11-27 10:14:04,760 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 250, loss[loss=0.05993, simple_loss=0.08496, pruned_loss=0.008495, audio_tagging_loss=0.008956, over 15935.00 frames. ], tot_loss[loss=0.06758, simple_loss=0.08883, pruned_loss=0.01168, audio_tagging_loss=0.01148, over 2180585.40 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:14:04,827 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 577400 2023-11-27 10:14:08,420 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3849320.0, ans=0.125 2023-11-27 10:14:15,253 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3849386.6666666665, ans=0.125 2023-11-27 10:14:35,623 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3849453.3333333335, ans=0.125 2023-11-27 10:15:00,634 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 300, loss[loss=0.05329, simple_loss=0.07437, pruned_loss=0.008951, audio_tagging_loss=0.007152, over 15042.00 frames. ], tot_loss[loss=0.06687, simple_loss=0.08911, pruned_loss=0.0117, audio_tagging_loss=0.01061, over 2376542.91 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:15:00,698 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 577450 2023-11-27 10:15:00,801 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3849653.3333333335, ans=0.125 2023-11-27 10:15:15,354 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.597e+01 9.252e+01 9.785e+01 1.052e+02 1.385e+02, threshold=1.957e+02, percent-clipped=0.0 2023-11-27 10:15:50,176 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3849920.0, ans=0.035 2023-11-27 10:15:51,346 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3849920.0, ans=0.0 2023-11-27 10:15:55,342 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 350, loss[loss=0.07239, simple_loss=0.1036, pruned_loss=0.01153, audio_tagging_loss=0.00907, over 15328.00 frames. ], tot_loss[loss=0.06713, simple_loss=0.09025, pruned_loss=0.01202, audio_tagging_loss=0.009992, over 2529710.15 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:15:55,415 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 577500 2023-11-27 10:16:12,884 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3850053.3333333335, ans=0.125 2023-11-27 10:16:50,559 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 400, loss[loss=0.06843, simple_loss=0.09533, pruned_loss=0.01091, audio_tagging_loss=0.009853, over 15208.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.08976, pruned_loss=0.0119, audio_tagging_loss=0.009698, over 2647662.76 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:16:50,630 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 577550 2023-11-27 10:16:55,932 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3850320.0, ans=0.125 2023-11-27 10:16:56,052 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3850320.0, ans=0.125 2023-11-27 10:17:07,269 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.422e+01 8.990e+01 9.603e+01 1.040e+02 1.304e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-27 10:17:24,289 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3850520.0, ans=0.0 2023-11-27 10:17:34,588 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.11 vs. limit=10.0 2023-11-27 10:17:35,446 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3850586.6666666665, ans=0.125 2023-11-27 10:17:46,089 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 450, loss[loss=0.06422, simple_loss=0.08521, pruned_loss=0.0125, audio_tagging_loss=0.009117, over 14838.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.08892, pruned_loss=0.01189, audio_tagging_loss=0.009448, over 2745518.82 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:17:46,168 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 577600 2023-11-27 10:17:46,794 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.27 vs. limit=15.0 2023-11-27 10:17:50,654 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3850653.3333333335, ans=0.025 2023-11-27 10:17:52,300 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.22 vs. limit=6.0 2023-11-27 10:18:12,900 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.10 vs. limit=15.0 2023-11-27 10:18:13,122 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.09 vs. limit=15.0 2023-11-27 10:18:23,076 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3850853.3333333335, ans=0.0 2023-11-27 10:18:40,541 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 500, loss[loss=0.05681, simple_loss=0.08031, pruned_loss=0.01052, audio_tagging_loss=0.006137, over 15632.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.08933, pruned_loss=0.01208, audio_tagging_loss=0.009282, over 2811659.77 frames. ], batch size: 61, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:18:40,623 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 577650 2023-11-27 10:18:41,833 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3850986.6666666665, ans=0.1 2023-11-27 10:18:43,016 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 10:18:56,638 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.792e+01 9.066e+01 9.728e+01 1.042e+02 1.279e+02, threshold=1.946e+02, percent-clipped=0.0 2023-11-27 10:18:58,889 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3851053.3333333335, ans=0.2 2023-11-27 10:19:01,102 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3851120.0, ans=0.125 2023-11-27 10:19:07,832 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=3851120.0, ans=22.5 2023-11-27 10:19:16,412 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3851186.6666666665, ans=0.2 2023-11-27 10:19:17,342 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3851186.6666666665, ans=0.125 2023-11-27 10:19:25,927 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3851253.3333333335, ans=0.125 2023-11-27 10:19:33,219 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3851253.3333333335, ans=0.125 2023-11-27 10:19:35,130 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 550, loss[loss=0.07693, simple_loss=0.1088, pruned_loss=0.01499, audio_tagging_loss=0.007523, over 14923.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.08976, pruned_loss=0.01215, audio_tagging_loss=0.009041, over 2865511.91 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:19:35,203 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 577700 2023-11-27 10:19:38,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3851320.0, ans=0.125 2023-11-27 10:20:14,884 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3851520.0, ans=0.2 2023-11-27 10:20:30,439 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 600, loss[loss=0.05574, simple_loss=0.06779, pruned_loss=0.009682, audio_tagging_loss=0.01217, over 15300.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.09012, pruned_loss=0.01214, audio_tagging_loss=0.008862, over 2906831.80 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:20:30,500 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 577750 2023-11-27 10:20:32,092 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3851653.3333333335, ans=0.125 2023-11-27 10:20:47,118 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.817e+01 9.017e+01 9.537e+01 1.031e+02 1.710e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-27 10:20:47,408 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3851720.0, ans=0.07 2023-11-27 10:20:47,416 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3851720.0, ans=0.125 2023-11-27 10:20:50,534 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3851720.0, ans=0.125 2023-11-27 10:20:53,660 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3851786.6666666665, ans=0.125 2023-11-27 10:21:03,606 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3851853.3333333335, ans=0.125 2023-11-27 10:21:25,965 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 650, loss[loss=0.07532, simple_loss=0.1109, pruned_loss=0.01285, audio_tagging_loss=0.007042, over 15359.00 frames. ], tot_loss[loss=0.06662, simple_loss=0.09095, pruned_loss=0.01231, audio_tagging_loss=0.008829, over 2941043.56 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:21:26,039 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 577800 2023-11-27 10:21:34,843 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3851986.6666666665, ans=0.0 2023-11-27 10:21:45,420 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=3852053.3333333335, ans=15.0 2023-11-27 10:22:03,090 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3852186.6666666665, ans=0.125 2023-11-27 10:22:09,749 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.81 vs. limit=10.0 2023-11-27 10:22:12,415 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3852253.3333333335, ans=0.1 2023-11-27 10:22:20,665 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 700, loss[loss=0.0613, simple_loss=0.08741, pruned_loss=0.009847, audio_tagging_loss=0.007754, over 15482.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.08986, pruned_loss=0.01207, audio_tagging_loss=0.008844, over 2969200.99 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:22:20,732 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 577850 2023-11-27 10:22:22,473 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.07 vs. limit=15.0 2023-11-27 10:22:37,744 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.943e+01 9.117e+01 9.739e+01 1.041e+02 1.243e+02, threshold=1.948e+02, percent-clipped=0.0 2023-11-27 10:22:46,227 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.39 vs. limit=22.5 2023-11-27 10:22:53,139 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3852520.0, ans=0.125 2023-11-27 10:23:13,935 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.85 vs. limit=15.0 2023-11-27 10:23:16,083 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 750, loss[loss=0.07284, simple_loss=0.1017, pruned_loss=0.01223, audio_tagging_loss=0.009749, over 16035.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.09067, pruned_loss=0.01218, audio_tagging_loss=0.008875, over 2987349.93 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:23:16,149 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 577900 2023-11-27 10:23:21,009 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3852653.3333333335, ans=0.04949747468305833 2023-11-27 10:23:28,734 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3852720.0, ans=0.1 2023-11-27 10:23:31,236 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.13 vs. limit=15.0 2023-11-27 10:23:46,806 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.88 vs. limit=15.0 2023-11-27 10:23:56,842 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3852853.3333333335, ans=0.125 2023-11-27 10:24:11,160 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 800, loss[loss=0.07231, simple_loss=0.09923, pruned_loss=0.01393, audio_tagging_loss=0.008766, over 16064.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.08972, pruned_loss=0.01206, audio_tagging_loss=0.008947, over 3003742.42 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 32.0 2023-11-27 10:24:11,233 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 577950 2023-11-27 10:24:26,901 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.508e+01 9.085e+01 9.807e+01 1.032e+02 1.313e+02, threshold=1.961e+02, percent-clipped=0.0 2023-11-27 10:24:35,841 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=3853120.0, ans=0.5 2023-11-27 10:24:36,968 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3853120.0, ans=0.125 2023-11-27 10:24:50,833 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.02 vs. limit=15.0 2023-11-27 10:24:56,541 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.18 vs. limit=15.0 2023-11-27 10:25:05,466 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 850, loss[loss=0.06146, simple_loss=0.08943, pruned_loss=0.009686, audio_tagging_loss=0.007061, over 15176.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.09049, pruned_loss=0.0121, audio_tagging_loss=0.008875, over 3019717.72 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 32.0 2023-11-27 10:25:05,528 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 578000 2023-11-27 10:25:26,306 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3853386.6666666665, ans=0.125 2023-11-27 10:25:57,510 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.95 vs. limit=22.5 2023-11-27 10:26:01,083 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 900, loss[loss=0.0558, simple_loss=0.07522, pruned_loss=0.008504, audio_tagging_loss=0.00969, over 15023.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.09022, pruned_loss=0.01215, audio_tagging_loss=0.00886, over 3030500.04 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:26:01,154 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 578050 2023-11-27 10:26:10,733 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3853653.3333333335, ans=0.125 2023-11-27 10:26:15,983 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3853720.0, ans=0.125 2023-11-27 10:26:19,312 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.811e+01 9.242e+01 9.846e+01 1.086e+02 1.686e+02, threshold=1.969e+02, percent-clipped=0.0 2023-11-27 10:26:53,707 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3853920.0, ans=0.2 2023-11-27 10:26:53,757 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3853920.0, ans=0.0 2023-11-27 10:26:54,746 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3853920.0, ans=0.125 2023-11-27 10:26:56,610 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 950, loss[loss=0.05407, simple_loss=0.06631, pruned_loss=0.01182, audio_tagging_loss=0.009096, over 15064.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08995, pruned_loss=0.012, audio_tagging_loss=0.008797, over 3035273.46 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:26:56,680 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 578100 2023-11-27 10:27:08,207 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.35 vs. limit=15.0 2023-11-27 10:27:26,794 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3854120.0, ans=0.0 2023-11-27 10:27:35,562 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3854186.6666666665, ans=0.2 2023-11-27 10:27:38,730 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff2.min_abs, batch_count=3854186.6666666665, ans=0.1 2023-11-27 10:27:51,641 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 1000, loss[loss=0.05635, simple_loss=0.07666, pruned_loss=0.01124, audio_tagging_loss=0.006786, over 15778.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.08991, pruned_loss=0.01212, audio_tagging_loss=0.008687, over 3041524.49 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:27:51,714 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 578150 2023-11-27 10:28:04,866 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3854386.6666666665, ans=0.0 2023-11-27 10:28:09,320 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.477e+01 9.145e+01 9.757e+01 1.033e+02 1.378e+02, threshold=1.951e+02, percent-clipped=0.0 2023-11-27 10:28:15,059 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 10:28:25,757 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 10:28:46,180 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 1050, loss[loss=0.04381, simple_loss=0.05156, pruned_loss=0.005175, audio_tagging_loss=0.01285, over 15407.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.08978, pruned_loss=0.01206, audio_tagging_loss=0.008618, over 3040430.93 frames. ], batch size: 60, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:28:46,247 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 578200 2023-11-27 10:28:47,861 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3854653.3333333335, ans=0.0 2023-11-27 10:29:01,376 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3854720.0, ans=0.2 2023-11-27 10:29:04,418 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3854720.0, ans=0.125 2023-11-27 10:29:16,540 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3854786.6666666665, ans=0.0 2023-11-27 10:29:38,841 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3854920.0, ans=0.5 2023-11-27 10:29:41,660 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 1100, loss[loss=0.07039, simple_loss=0.09667, pruned_loss=0.009665, audio_tagging_loss=0.01238, over 16351.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08969, pruned_loss=0.01205, audio_tagging_loss=0.008615, over 3041904.06 frames. ], batch size: 60, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:29:41,728 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 578250 2023-11-27 10:29:43,840 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 10:29:45,142 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3854986.6666666665, ans=0.2 2023-11-27 10:29:50,269 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3854986.6666666665, ans=0.1 2023-11-27 10:29:58,968 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.766e+01 8.982e+01 9.681e+01 1.049e+02 1.414e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-27 10:30:00,163 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3855053.3333333335, ans=0.0 2023-11-27 10:30:03,392 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3855120.0, ans=0.125 2023-11-27 10:30:19,789 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3855186.6666666665, ans=0.1 2023-11-27 10:30:29,625 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.03 vs. limit=22.5 2023-11-27 10:30:36,810 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 1150, loss[loss=0.04082, simple_loss=0.05351, pruned_loss=0.004067, audio_tagging_loss=0.009997, over 15241.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.09007, pruned_loss=0.01199, audio_tagging_loss=0.008452, over 3050706.33 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:30:36,883 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 578300 2023-11-27 10:30:41,128 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3855320.0, ans=0.125 2023-11-27 10:30:46,442 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3855386.6666666665, ans=0.0 2023-11-27 10:31:11,860 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3855520.0, ans=0.125 2023-11-27 10:31:12,074 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 10:31:12,187 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.53 vs. limit=15.0 2023-11-27 10:31:23,404 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.36 vs. limit=15.0 2023-11-27 10:31:23,902 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3855586.6666666665, ans=0.1 2023-11-27 10:31:28,194 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3855586.6666666665, ans=0.125 2023-11-27 10:31:31,597 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 1200, loss[loss=0.07408, simple_loss=0.106, pruned_loss=0.01257, audio_tagging_loss=0.008492, over 15756.00 frames. ], tot_loss[loss=0.06486, simple_loss=0.08913, pruned_loss=0.01177, audio_tagging_loss=0.008517, over 3051603.97 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 32.0 2023-11-27 10:31:31,660 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 578350 2023-11-27 10:31:34,215 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.69 vs. limit=15.0 2023-11-27 10:31:42,401 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3855720.0, ans=0.0 2023-11-27 10:31:49,417 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.244e+01 9.092e+01 9.675e+01 1.031e+02 1.166e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-27 10:31:53,814 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3855786.6666666665, ans=0.125 2023-11-27 10:32:06,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3855853.3333333335, ans=0.125 2023-11-27 10:32:13,828 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3855853.3333333335, ans=0.125 2023-11-27 10:32:22,718 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 10:32:27,241 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 1250, loss[loss=0.06371, simple_loss=0.08676, pruned_loss=0.01087, audio_tagging_loss=0.009463, over 15056.00 frames. ], tot_loss[loss=0.06439, simple_loss=0.08836, pruned_loss=0.01167, audio_tagging_loss=0.008534, over 3051750.28 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 32.0 2023-11-27 10:32:27,305 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 578400 2023-11-27 10:32:40,372 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3856053.3333333335, ans=0.1 2023-11-27 10:32:58,089 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3856120.0, ans=0.09899494936611666 2023-11-27 10:33:13,674 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3856253.3333333335, ans=0.125 2023-11-27 10:33:13,760 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3856253.3333333335, ans=0.0 2023-11-27 10:33:16,917 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3856253.3333333335, ans=0.125 2023-11-27 10:33:19,893 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3856253.3333333335, ans=0.125 2023-11-27 10:33:21,807 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 1300, loss[loss=0.05632, simple_loss=0.06787, pruned_loss=0.008513, audio_tagging_loss=0.01387, over 15694.00 frames. ], tot_loss[loss=0.06481, simple_loss=0.08931, pruned_loss=0.01172, audio_tagging_loss=0.008438, over 3061238.61 frames. ], batch size: 62, lr: 1.38e-03, grad_scale: 32.0 2023-11-27 10:33:21,874 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 578450 2023-11-27 10:33:30,985 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3856320.0, ans=0.125 2023-11-27 10:33:35,684 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3856386.6666666665, ans=0.0 2023-11-27 10:33:37,641 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3856386.6666666665, ans=0.0 2023-11-27 10:33:39,549 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.313e+01 9.062e+01 9.714e+01 1.030e+02 1.237e+02, threshold=1.943e+02, percent-clipped=0.0 2023-11-27 10:33:52,287 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3856453.3333333335, ans=0.125 2023-11-27 10:34:08,049 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3856586.6666666665, ans=0.0 2023-11-27 10:34:15,837 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3856653.3333333335, ans=0.0 2023-11-27 10:34:17,146 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 1350, loss[loss=0.05244, simple_loss=0.06981, pruned_loss=0.008443, audio_tagging_loss=0.009095, over 14954.00 frames. ], tot_loss[loss=0.06502, simple_loss=0.08938, pruned_loss=0.01183, audio_tagging_loss=0.0085, over 3052420.54 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 32.0 2023-11-27 10:34:17,220 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 578500 2023-11-27 10:34:18,788 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.12 vs. limit=12.0 2023-11-27 10:34:21,561 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3856653.3333333335, ans=0.2 2023-11-27 10:34:23,641 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-27 10:34:30,417 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.29 vs. limit=12.0 2023-11-27 10:34:40,485 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3856786.6666666665, ans=0.07 2023-11-27 10:34:41,356 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3856786.6666666665, ans=0.125 2023-11-27 10:34:43,476 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3856786.6666666665, ans=0.125 2023-11-27 10:34:49,755 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3856853.3333333335, ans=0.125 2023-11-27 10:34:55,364 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 10:35:08,138 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3856920.0, ans=0.125 2023-11-27 10:35:12,560 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 1400, loss[loss=0.06142, simple_loss=0.0855, pruned_loss=0.01129, audio_tagging_loss=0.007382, over 14205.00 frames. ], tot_loss[loss=0.06504, simple_loss=0.08932, pruned_loss=0.01188, audio_tagging_loss=0.008492, over 3059276.50 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:35:12,631 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 578550 2023-11-27 10:35:30,482 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.412e+01 9.274e+01 9.843e+01 1.071e+02 1.343e+02, threshold=1.969e+02, percent-clipped=0.0 2023-11-27 10:35:35,953 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3857120.0, ans=0.0 2023-11-27 10:35:49,145 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3857186.6666666665, ans=0.125 2023-11-27 10:35:54,954 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3857186.6666666665, ans=0.1 2023-11-27 10:36:07,251 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 1450, loss[loss=0.05032, simple_loss=0.07365, pruned_loss=0.007104, audio_tagging_loss=0.006393, over 15040.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.0894, pruned_loss=0.01192, audio_tagging_loss=0.008539, over 3060883.71 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:36:07,320 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 578600 2023-11-27 10:36:07,557 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3857320.0, ans=0.0 2023-11-27 10:36:52,803 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3857586.6666666665, ans=0.1 2023-11-27 10:36:58,099 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3857586.6666666665, ans=0.0 2023-11-27 10:37:02,052 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 1500, loss[loss=0.0577, simple_loss=0.07751, pruned_loss=0.008838, audio_tagging_loss=0.01011, over 14547.00 frames. ], tot_loss[loss=0.06486, simple_loss=0.08861, pruned_loss=0.01193, audio_tagging_loss=0.008616, over 3054951.38 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:37:02,117 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 578650 2023-11-27 10:37:15,937 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3857720.0, ans=10.0 2023-11-27 10:37:21,431 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.975e+01 9.188e+01 9.715e+01 1.038e+02 1.214e+02, threshold=1.943e+02, percent-clipped=0.0 2023-11-27 10:37:23,886 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3857786.6666666665, ans=0.0 2023-11-27 10:37:40,613 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3857853.3333333335, ans=0.1 2023-11-27 10:37:56,066 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.91 vs. limit=22.5 2023-11-27 10:37:57,667 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 1550, loss[loss=0.06551, simple_loss=0.09142, pruned_loss=0.01104, audio_tagging_loss=0.008762, over 15797.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08902, pruned_loss=0.01195, audio_tagging_loss=0.008658, over 3049869.50 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:37:57,736 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 578700 2023-11-27 10:38:10,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3858053.3333333335, ans=0.2 2023-11-27 10:38:15,210 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3858053.3333333335, ans=0.125 2023-11-27 10:38:29,428 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3858186.6666666665, ans=0.2 2023-11-27 10:38:31,959 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3858186.6666666665, ans=0.0 2023-11-27 10:38:46,416 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3858253.3333333335, ans=0.1 2023-11-27 10:38:52,574 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 1600, loss[loss=0.07726, simple_loss=0.1191, pruned_loss=0.01226, audio_tagging_loss=0.005471, over 16299.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.08956, pruned_loss=0.01196, audio_tagging_loss=0.008742, over 3051563.44 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 32.0 2023-11-27 10:38:52,641 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 578750 2023-11-27 10:39:10,777 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.536e+01 9.050e+01 9.679e+01 1.052e+02 1.346e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-27 10:39:15,773 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3858453.3333333335, ans=0.0 2023-11-27 10:39:21,512 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 10:39:30,195 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3858520.0, ans=0.0 2023-11-27 10:39:31,339 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3858520.0, ans=0.0 2023-11-27 10:39:46,737 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 1650, loss[loss=0.08193, simple_loss=0.1124, pruned_loss=0.01473, audio_tagging_loss=0.01101, over 15182.00 frames. ], tot_loss[loss=0.06522, simple_loss=0.08903, pruned_loss=0.01192, audio_tagging_loss=0.008779, over 3047474.96 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 32.0 2023-11-27 10:39:46,812 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 578800 2023-11-27 10:39:52,056 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3858653.3333333335, ans=0.0 2023-11-27 10:40:17,569 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3858786.6666666665, ans=0.125 2023-11-27 10:40:17,613 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 10:40:37,185 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3858920.0, ans=0.125 2023-11-27 10:40:38,161 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3858920.0, ans=0.0 2023-11-27 10:40:40,191 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.71 vs. limit=15.0 2023-11-27 10:40:43,187 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 1700, loss[loss=0.07071, simple_loss=0.09341, pruned_loss=0.01377, audio_tagging_loss=0.01023, over 15781.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08937, pruned_loss=0.01185, audio_tagging_loss=0.008822, over 3058060.18 frames. ], batch size: 61, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:40:43,261 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 578850 2023-11-27 10:40:52,172 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3858986.6666666665, ans=0.125 2023-11-27 10:40:54,306 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3859053.3333333335, ans=0.125 2023-11-27 10:40:55,451 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3859053.3333333335, ans=0.125 2023-11-27 10:41:02,528 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.708e+01 9.167e+01 9.822e+01 1.054e+02 1.344e+02, threshold=1.964e+02, percent-clipped=0.0 2023-11-27 10:41:10,025 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3859120.0, ans=0.125 2023-11-27 10:41:20,180 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3859186.6666666665, ans=0.1 2023-11-27 10:41:22,823 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3859186.6666666665, ans=0.125 2023-11-27 10:41:38,514 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 1750, loss[loss=0.08785, simple_loss=0.1235, pruned_loss=0.01826, audio_tagging_loss=0.007838, over 15626.00 frames. ], tot_loss[loss=0.0651, simple_loss=0.08921, pruned_loss=0.01174, audio_tagging_loss=0.008752, over 3055368.86 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:41:38,587 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 578900 2023-11-27 10:41:46,419 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.31 vs. limit=22.5 2023-11-27 10:42:14,941 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3859520.0, ans=0.125 2023-11-27 10:42:17,201 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3859520.0, ans=0.0 2023-11-27 10:42:29,915 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3859586.6666666665, ans=0.125 2023-11-27 10:42:32,883 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 1800, loss[loss=0.0757, simple_loss=0.1028, pruned_loss=0.01253, audio_tagging_loss=0.01177, over 15377.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.08955, pruned_loss=0.01166, audio_tagging_loss=0.008645, over 3057273.31 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:42:32,953 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 578950 2023-11-27 10:42:50,386 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3859720.0, ans=0.125 2023-11-27 10:42:53,799 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.101e+01 8.995e+01 9.639e+01 1.040e+02 1.222e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-27 10:43:01,408 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3859786.6666666665, ans=0.125 2023-11-27 10:43:09,882 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3859853.3333333335, ans=0.125 2023-11-27 10:43:14,024 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3859853.3333333335, ans=0.125 2023-11-27 10:43:14,140 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3859853.3333333335, ans=0.125 2023-11-27 10:43:28,240 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 1850, loss[loss=0.07907, simple_loss=0.1128, pruned_loss=0.01616, audio_tagging_loss=0.006506, over 15240.00 frames. ], tot_loss[loss=0.06501, simple_loss=0.08947, pruned_loss=0.0117, audio_tagging_loss=0.008579, over 3052161.37 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:43:28,318 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 579000 2023-11-27 10:43:29,859 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3859986.6666666665, ans=0.125 2023-11-27 10:43:32,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3859986.6666666665, ans=0.125 2023-11-27 10:43:37,656 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.92 vs. limit=15.0 2023-11-27 10:43:40,437 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3860053.3333333335, ans=0.1 2023-11-27 10:43:46,143 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.86 vs. limit=12.0 2023-11-27 10:43:53,481 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.24 vs. limit=22.5 2023-11-27 10:43:54,413 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.57 vs. limit=12.0 2023-11-27 10:43:56,217 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3860120.0, ans=0.125 2023-11-27 10:43:58,187 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3860120.0, ans=0.125 2023-11-27 10:44:01,445 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3860186.6666666665, ans=0.025 2023-11-27 10:44:07,199 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3860186.6666666665, ans=0.0 2023-11-27 10:44:11,838 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.12 vs. limit=6.0 2023-11-27 10:44:23,750 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 1900, loss[loss=0.05037, simple_loss=0.06845, pruned_loss=0.006036, audio_tagging_loss=0.0101, over 16340.00 frames. ], tot_loss[loss=0.06493, simple_loss=0.08954, pruned_loss=0.01167, audio_tagging_loss=0.008494, over 3051698.99 frames. ], batch size: 64, lr: 1.38e-03, grad_scale: 8.0 2023-11-27 10:44:23,830 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 579050 2023-11-27 10:44:41,125 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=3860386.6666666665, ans=15.0 2023-11-27 10:44:44,338 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.344e+01 9.131e+01 9.734e+01 1.046e+02 1.295e+02, threshold=1.947e+02, percent-clipped=0.0 2023-11-27 10:44:46,740 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3860453.3333333335, ans=0.0 2023-11-27 10:44:54,639 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3860453.3333333335, ans=0.2 2023-11-27 10:44:54,892 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.80 vs. limit=10.0 2023-11-27 10:44:59,902 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3860520.0, ans=0.125 2023-11-27 10:45:01,929 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3860520.0, ans=0.0 2023-11-27 10:45:18,684 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 1950, loss[loss=0.05322, simple_loss=0.07589, pruned_loss=0.007327, audio_tagging_loss=0.00795, over 16694.00 frames. ], tot_loss[loss=0.06469, simple_loss=0.08907, pruned_loss=0.01168, audio_tagging_loss=0.008468, over 3049995.62 frames. ], batch size: 63, lr: 1.38e-03, grad_scale: 8.0 2023-11-27 10:45:18,762 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 579100 2023-11-27 10:45:38,191 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.28 vs. limit=15.0 2023-11-27 10:45:45,046 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3860786.6666666665, ans=0.125 2023-11-27 10:46:05,003 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3860920.0, ans=0.125 2023-11-27 10:46:13,752 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 2000, loss[loss=0.09001, simple_loss=0.1387, pruned_loss=0.01567, audio_tagging_loss=0.004967, over 15315.00 frames. ], tot_loss[loss=0.0647, simple_loss=0.08909, pruned_loss=0.01171, audio_tagging_loss=0.00844, over 3043028.35 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:46:13,827 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 579150 2023-11-27 10:46:22,630 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3860986.6666666665, ans=0.2 2023-11-27 10:46:32,906 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.70 vs. limit=15.0 2023-11-27 10:46:32,941 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.94 vs. limit=15.0 2023-11-27 10:46:35,553 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.005e+01 8.839e+01 9.475e+01 1.022e+02 1.680e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-27 10:46:35,889 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3861120.0, ans=0.125 2023-11-27 10:47:00,474 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3861253.3333333335, ans=0.0 2023-11-27 10:47:09,520 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3861320.0, ans=0.1 2023-11-27 10:47:10,264 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 2050, loss[loss=0.05711, simple_loss=0.0732, pruned_loss=0.008813, audio_tagging_loss=0.01169, over 15442.00 frames. ], tot_loss[loss=0.06499, simple_loss=0.08936, pruned_loss=0.01185, audio_tagging_loss=0.008462, over 3042737.05 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:47:10,338 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 579200 2023-11-27 10:47:11,812 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3861320.0, ans=0.125 2023-11-27 10:47:22,258 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3861386.6666666665, ans=0.0 2023-11-27 10:47:44,340 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3861520.0, ans=0.125 2023-11-27 10:47:54,191 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.30 vs. limit=22.5 2023-11-27 10:48:07,619 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 2100, loss[loss=0.0761, simple_loss=0.1054, pruned_loss=0.01643, audio_tagging_loss=0.006953, over 15830.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08989, pruned_loss=0.01195, audio_tagging_loss=0.008423, over 3050164.76 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:48:07,711 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 579250 2023-11-27 10:48:28,804 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.817e+01 8.950e+01 9.629e+01 1.055e+02 1.441e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-27 10:48:35,957 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 10:48:46,887 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3861853.3333333335, ans=0.5 2023-11-27 10:48:54,821 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.14 vs. limit=10.0 2023-11-27 10:49:03,227 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 2150, loss[loss=0.08105, simple_loss=0.1029, pruned_loss=0.0198, audio_tagging_loss=0.009821, over 14175.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.0904, pruned_loss=0.012, audio_tagging_loss=0.008396, over 3049728.54 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:49:03,331 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 579300 2023-11-27 10:49:11,472 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3861986.6666666665, ans=0.125 2023-11-27 10:49:15,149 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3862053.3333333335, ans=0.125 2023-11-27 10:49:16,403 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.84 vs. limit=22.5 2023-11-27 10:49:18,456 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3862053.3333333335, ans=10.0 2023-11-27 10:49:28,439 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3862120.0, ans=0.125 2023-11-27 10:49:31,597 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3862120.0, ans=0.125 2023-11-27 10:49:35,562 WARNING [train_asr.py:1481] (0/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 10:49:36,102 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.55 vs. limit=22.5 2023-11-27 10:49:49,256 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3862253.3333333335, ans=0.0 2023-11-27 10:49:55,185 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3862253.3333333335, ans=0.125 2023-11-27 10:49:59,853 INFO [train_asr.py:1235] (0/4) Epoch 49, batch 2200, loss[loss=0.07555, simple_loss=0.1038, pruned_loss=0.01589, audio_tagging_loss=0.007744, over 16453.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08985, pruned_loss=0.01197, audio_tagging_loss=0.008473, over 3046984.86 frames. ], batch size: 61, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:50:00,018 INFO [model.py:807] (0/4) Freeze_encoder: False; Current batch idx: 579350